Wine Quality Dataset (doi:10.5072/FK2/YKJQY8)
(Wine Quality Prediction - Classification Prediction)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Entire Codebook

Document Description

Citation

Title:

Wine Quality Dataset

Identification Number:

doi:10.5072/FK2/YKJQY8

Distributor:

Root

Date of Distribution:

2024-05-30

Version:

1

Bibliographic Citation:

H, M Yasser, 2024, "Wine Quality Dataset", https://doi.org/10.5072/FK2/YKJQY8, Root, V1, UNF:6:E3SROxRmTcGbJ1ofdnmuQQ== [fileUNF]

Study Description

Citation

Title:

Wine Quality Dataset

Alternative Title:

Wine Quality Prediction - Classification Prediction

Identification Number:

doi:10.5072/FK2/YKJQY8

Authoring Entity:

H, M Yasser

Distributor:

Root

Depositor:

Durbin, Philip

Date of Deposit:

2024-05-30

Holdings Information:

https://doi.org/10.5072/FK2/YKJQY8

Study Scope

Keywords:

Other, audience > beginner, data type > tabular, subject > earth and nature, subject > health and fitness > food > alcohol, subject > health and fitness > food, task > classification

Abstract:

<p>This datasets is related to red variants of the Portuguese &quot;Vinho Verde&quot; wine.The dataset describes the amount of various chemicals present in wine and their effect on it's quality. The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).Your task is to predict the quality of wine using the given data.</p> <p>A simple yet challenging project, to anticipate the quality of wine. The complexity arises due to the fact that the dataset has fewer samples, &amp; is highly imbalanced. Can you overcome these obstacles &amp; build a good predictive model to classify them?</p> <p><strong>This data frame contains the following columns:</strong></p> <p>Input variables (based on physicochemical tests):<br /> 1 - fixed acidity<br /> 2 - volatile acidity<br /> 3 - citric acid<br /> 4 - residual sugar<br /> 5 - chlorides<br /> 6 - free sulfur dioxide<br /> 7 - total sulfur dioxide<br /> 8 - density<br /> 9 - pH<br /> 10 - sulphates<br /> 11 - alcohol<br /> Output variable (based on sensory data):<br /> 12 - quality (score between 0 and 10)</p> <h3>Acknowledgements:</h3> <p>This dataset is also available from Kaggle &amp; UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality.</p> <h3>Objective:</h3> <ul> <li>Understand the Dataset &amp; cleanup (if required).</li> <li>Build classification models to predict the wine quality.</li> <li>Also fine-tune the hyperparameters &amp; compare the evaluation metrics of various classification algorithms.</li> </ul> <p>This dataset was originally published on Kaggle at <a href="https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">https://www.kaggle.com/datasets/yasserh/wine-quality-dataset</a></p>

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

File Description--f26207

File: WineQT.tab

  • Number of cases: 1143

  • No. of variables per record: 13

  • Type of File: text/tab-separated-values

Notes:

UNF:6:E3SROxRmTcGbJ1ofdnmuQQ==

**This data frame contains the following columns:** Input variables (based on physicochemical tests):\ 1 - fixed acidity\ 2 - volatile acidity\ 3 - citric acid\ 4 - residual sugar\ 5 - chlorides\ 6 - free sulfur dioxide\ 7 - total sulfur dioxide\ 8 - density\ 9 - pH\ 10 - sulphates\ 11 - alcohol\ Output variable (based on sensory data):\ 12 - quality (score between 0 and 10)

Variable Description

List of Variables:

Variables

fixed acidity

f26207 Location:

Summary Statistics: Mean 8.31111111111111; Min. 4.6; Valid 1143.0; Max. 15.9; StDev 1.7475950171695362

Variable Format: numeric

Notes: UNF:6:lmMhTQrKIaAlMvaIIN67yA==

volatile acidity

f26207 Location:

Summary Statistics: Valid 1143.0; Max. 1.58; StDev 0.17963319302252442; Min. 0.12; Mean 0.5313385826771654

Variable Format: numeric

Notes: UNF:6:tFD1QilllC/7mIpl1duieA==

citric acid

f26207 Location:

Summary Statistics: Mean 0.26836395450568684; Valid 1143.0; Min. 0.0; StDev 0.19668585234821898; Max. 1.0;

Variable Format: numeric

Notes: UNF:6:bmB481nw8P3EZS1v0sWHGQ==

residual sugar

f26207 Location:

Summary Statistics: Mean 2.532152230971129; StDev 1.3559174666826788; Min. 0.9; Max. 15.5; Valid 1143.0

Variable Format: numeric

Notes: UNF:6:XfN3EyjHSzBgjARggN3LIQ==

chlorides

f26207 Location:

Summary Statistics: StDev 0.047267337952380556; Min. 0.012; Valid 1143.0; Max. 0.611; Mean 0.0869326334208224

Variable Format: numeric

Notes: UNF:6://KRUC5g29pLiQKMdPRXfw==

free sulfur dioxide

f26207 Location:

Summary Statistics: Mean 15.61548556430448; Max. 68.0; Valid 1143.0; StDev 10.250486123430814; Min. 1.0;

Variable Format: numeric

Notes: UNF:6:TzbpgwnFkwci7bml1euhcw==

total sulfur dioxide

f26207 Location:

Summary Statistics: Mean 45.91469816272971; Min. 6.0; Valid 1143.0; Max. 289.0; StDev 32.78213030734315;

Variable Format: numeric

Notes: UNF:6:EZwG2bzNpeI6HHEgx1W3bg==

density

f26207 Location:

Summary Statistics: Min. 0.9900700000000001; Max. 1.00369; Valid 1143.0; StDev 0.001925067130254572; Mean 0.9967304111986002;

Variable Format: numeric

Notes: UNF:6:ZpBPWZC2GNVvcopuZBDn0g==

pH

f26207 Location:

Summary Statistics: Min. 2.74; Mean 3.3110148731408575; StDev 0.15666405977275222; Valid 1143.0; Max. 4.01;

Variable Format: numeric

Notes: UNF:6:SMoM9ehelKiZnHhAHHs3aQ==

sulphates

f26207 Location:

Summary Statistics: Valid 1143.0; Max. 2.0; StDev 0.1703987144670742; Min. 0.33; Mean 0.6577077865266842

Variable Format: numeric

Notes: UNF:6:uPq+1TTs1rsv0T03Jg5pOw==

alcohol

f26207 Location:

Summary Statistics: StDev 1.0821956098764436; Valid 1143.0; Min. 8.4; Mean 10.442111402741324; Max. 14.9;

Variable Format: numeric

Notes: UNF:6:pyJpMIG+Oi7ZtKbJ+2bi0A==

quality

f26207 Location:

Summary Statistics: Valid 1143.0; Max. 8.0; Min. 3.0; Mean 5.657042869641295; StDev 0.8058242481000936;

Variable Format: numeric

Notes: UNF:6:tNDm3LzlLAEHdxGM6WN0Mw==

Id

f26207 Location:

Summary Statistics: Mean 804.9693788276538; Min. 0.0; Max. 1597.0; StDev 463.997116295106; Valid 1143.0

Variable Format: numeric

Notes: UNF:6:4SVNmbKV+Vmg7cF/3FxZ5Q==