View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Wine Quality Dataset |
Identification Number: |
doi:10.5072/FK2/YKJQY8 |
Distributor: |
Root |
Date of Distribution: |
2024-05-30 |
Version: |
1 |
Bibliographic Citation: |
H, M Yasser, 2024, "Wine Quality Dataset", https://doi.org/10.5072/FK2/YKJQY8, Root, V1, UNF:6:E3SROxRmTcGbJ1ofdnmuQQ== [fileUNF] |
Citation |
|
Title: |
Wine Quality Dataset |
Alternative Title: |
Wine Quality Prediction - Classification Prediction |
Identification Number: |
doi:10.5072/FK2/YKJQY8 |
Authoring Entity: |
H, M Yasser |
Distributor: |
Root |
Depositor: |
Durbin, Philip |
Date of Deposit: |
2024-05-30 |
Holdings Information: |
https://doi.org/10.5072/FK2/YKJQY8 |
Study Scope |
|
Keywords: |
Other, audience > beginner, data type > tabular, subject > earth and nature, subject > health and fitness > food > alcohol, subject > health and fitness > food, task > classification |
Abstract: |
<p>This datasets is related to red variants of the Portuguese "Vinho Verde" wine.The dataset describes the amount of various chemicals present in wine and their effect on it's quality. The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).Your task is to predict the quality of wine using the given data.</p> <p>A simple yet challenging project, to anticipate the quality of wine. The complexity arises due to the fact that the dataset has fewer samples, & is highly imbalanced. Can you overcome these obstacles & build a good predictive model to classify them?</p> <p><strong>This data frame contains the following columns:</strong></p> <p>Input variables (based on physicochemical tests):<br /> 1 - fixed acidity<br /> 2 - volatile acidity<br /> 3 - citric acid<br /> 4 - residual sugar<br /> 5 - chlorides<br /> 6 - free sulfur dioxide<br /> 7 - total sulfur dioxide<br /> 8 - density<br /> 9 - pH<br /> 10 - sulphates<br /> 11 - alcohol<br /> Output variable (based on sensory data):<br /> 12 - quality (score between 0 and 10)</p> <h3>Acknowledgements:</h3> <p>This dataset is also available from Kaggle & UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality.</p> <h3>Objective:</h3> <ul> <li>Understand the Dataset & cleanup (if required).</li> <li>Build classification models to predict the wine quality.</li> <li>Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.</li> </ul> <p>This dataset was originally published on Kaggle at <a href="https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">https://www.kaggle.com/datasets/yasserh/wine-quality-dataset</a></p> |
Methodology and Processing |
|
Sources Statement |
|
Data Access |
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
Other Study Description Materials |
|
File Description--f26207 |
|
File: WineQT.tab |
|
|
|
Notes: |
UNF:6:E3SROxRmTcGbJ1ofdnmuQQ== |
**This data frame contains the following columns:** Input variables (based on physicochemical tests):\ 1 - fixed acidity\ 2 - volatile acidity\ 3 - citric acid\ 4 - residual sugar\ 5 - chlorides\ 6 - free sulfur dioxide\ 7 - total sulfur dioxide\ 8 - density\ 9 - pH\ 10 - sulphates\ 11 - alcohol\ Output variable (based on sensory data):\ 12 - quality (score between 0 and 10) |
|
List of Variables: |
|
Variables |
|
f26207 Location: |
Summary Statistics: Mean 8.31111111111111; Min. 4.6; Valid 1143.0; Max. 15.9; StDev 1.7475950171695362 Variable Format: numeric Notes: UNF:6:lmMhTQrKIaAlMvaIIN67yA== |
f26207 Location: |
Summary Statistics: Valid 1143.0; Max. 1.58; StDev 0.17963319302252442; Min. 0.12; Mean 0.5313385826771654 Variable Format: numeric Notes: UNF:6:tFD1QilllC/7mIpl1duieA== |
f26207 Location: |
Summary Statistics: Mean 0.26836395450568684; Valid 1143.0; Min. 0.0; StDev 0.19668585234821898; Max. 1.0; Variable Format: numeric Notes: UNF:6:bmB481nw8P3EZS1v0sWHGQ== |
f26207 Location: |
Summary Statistics: Mean 2.532152230971129; StDev 1.3559174666826788; Min. 0.9; Max. 15.5; Valid 1143.0 Variable Format: numeric Notes: UNF:6:XfN3EyjHSzBgjARggN3LIQ== |
f26207 Location: |
Summary Statistics: StDev 0.047267337952380556; Min. 0.012; Valid 1143.0; Max. 0.611; Mean 0.0869326334208224 Variable Format: numeric Notes: UNF:6://KRUC5g29pLiQKMdPRXfw== |
f26207 Location: |
Summary Statistics: Mean 15.61548556430448; Max. 68.0; Valid 1143.0; StDev 10.250486123430814; Min. 1.0; Variable Format: numeric Notes: UNF:6:TzbpgwnFkwci7bml1euhcw== |
f26207 Location: |
Summary Statistics: Mean 45.91469816272971; Min. 6.0; Valid 1143.0; Max. 289.0; StDev 32.78213030734315; Variable Format: numeric Notes: UNF:6:EZwG2bzNpeI6HHEgx1W3bg== |
f26207 Location: |
Summary Statistics: Min. 0.9900700000000001; Max. 1.00369; Valid 1143.0; StDev 0.001925067130254572; Mean 0.9967304111986002; Variable Format: numeric Notes: UNF:6:ZpBPWZC2GNVvcopuZBDn0g== |
f26207 Location: |
Summary Statistics: Min. 2.74; Mean 3.3110148731408575; StDev 0.15666405977275222; Valid 1143.0; Max. 4.01; Variable Format: numeric Notes: UNF:6:SMoM9ehelKiZnHhAHHs3aQ== |
f26207 Location: |
Summary Statistics: Valid 1143.0; Max. 2.0; StDev 0.1703987144670742; Min. 0.33; Mean 0.6577077865266842 Variable Format: numeric Notes: UNF:6:uPq+1TTs1rsv0T03Jg5pOw== |
f26207 Location: |
Summary Statistics: StDev 1.0821956098764436; Valid 1143.0; Min. 8.4; Mean 10.442111402741324; Max. 14.9; Variable Format: numeric Notes: UNF:6:pyJpMIG+Oi7ZtKbJ+2bi0A== |
f26207 Location: |
Summary Statistics: Valid 1143.0; Max. 8.0; Min. 3.0; Mean 5.657042869641295; StDev 0.8058242481000936; Variable Format: numeric Notes: UNF:6:tNDm3LzlLAEHdxGM6WN0Mw== |
f26207 Location: |
Summary Statistics: Mean 804.9693788276538; Min. 0.0; Max. 1597.0; StDev 463.997116295106; Valid 1143.0 Variable Format: numeric Notes: UNF:6:4SVNmbKV+Vmg7cF/3FxZ5Q== |