Wine Quality Dataset (doi:10.5072/FK2/YKJQY8)
(Wine Quality Prediction - Classification Prediction)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Entire Codebook

Document Description
Citation
Title:	Wine Quality Dataset
Identification Number:	doi:10.5072/FK2/YKJQY8
Distributor:	Root
Date of Distribution:	2024-05-30
Version:	1
Bibliographic Citation:	H, M Yasser, 2024, "Wine Quality Dataset", https://doi.org/10.5072/FK2/YKJQY8, Root, V1, UNF:6:E3SROxRmTcGbJ1ofdnmuQQ== [fileUNF]
Study Description
Citation
Title:	Wine Quality Dataset
Alternative Title:	Wine Quality Prediction - Classification Prediction
Identification Number:	doi:10.5072/FK2/YKJQY8
Authoring Entity:	H, M Yasser
Distributor:	Root
Depositor:	Durbin, Philip
Date of Deposit:	2024-05-30
Holdings Information:	https://doi.org/10.5072/FK2/YKJQY8
Study Scope
Keywords:	Other, audience > beginner, data type > tabular, subject > earth and nature, subject > health and fitness > food > alcohol, subject > health and fitness > food, task > classification
Abstract:	<p>This datasets is related to red variants of the Portuguese "Vinho Verde" wine.The dataset describes the amount of various chemicals present in wine and their effect on it's quality. The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).Your task is to predict the quality of wine using the given data.</p> <p>A simple yet challenging project, to anticipate the quality of wine. The complexity arises due to the fact that the dataset has fewer samples, & is highly imbalanced. Can you overcome these obstacles & build a good predictive model to classify them?</p> <p><strong>This data frame contains the following columns:</strong></p> <p>Input variables (based on physicochemical tests):<br /> 1 - fixed acidity<br /> 2 - volatile acidity<br /> 3 - citric acid<br /> 4 - residual sugar<br /> 5 - chlorides<br /> 6 - free sulfur dioxide<br /> 7 - total sulfur dioxide<br /> 8 - density<br /> 9 - pH<br /> 10 - sulphates<br /> 11 - alcohol<br /> Output variable (based on sensory data):<br /> 12 - quality (score between 0 and 10)</p> <h3>Acknowledgements:</h3> <p>This dataset is also available from Kaggle & UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality.</p> <h3>Objective:</h3> <ul> <li>Understand the Dataset & cleanup (if required).</li> <li>Build classification models to predict the wine quality.</li> <li>Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.</li> </ul> <p>This dataset was originally published on Kaggle at <a href="https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">https://www.kaggle.com/datasets/yasserh/wine-quality-dataset</a></p>
Methodology and Processing
Sources Statement
Data Access
Notes:	<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>
Other Study Description Materials
File Description--f26207
File: WineQT.tab
	Number of cases: 1143 No. of variables per record: 13 Type of File: text/tab-separated-values
Notes:	UNF:6:E3SROxRmTcGbJ1ofdnmuQQ==
	This data frame contains the following columns: Input variables (based on physicochemical tests):\ 1 - fixed acidity\ 2 - volatile acidity\ 3 - citric acid\ 4 - residual sugar\ 5 - chlorides\ 6 - free sulfur dioxide\ 7 - total sulfur dioxide\ 8 - density\ 9 - pH\ 10 - sulphates\ 11 - alcohol\ Output variable (based on sensory data):\ 12 - quality (score between 0 and 10)
Variable Description
List of Variables:	fixed acidity - fixed acidity volatile acidity - volatile acidity citric acid - citric acid residual sugar - residual sugar chlorides - chlorides free sulfur dioxide - free sulfur dioxide total sulfur dioxide - total sulfur dioxide density - density pH - pH sulphates - sulphates alcohol - alcohol quality - quality Id - Id
Variables
fixed acidity
f26207 Location:	Summary Statistics: Mean 8.31111111111111; Min. 4.6; Valid 1143.0; Max. 15.9; StDev 1.7475950171695362 Variable Format: numeric Notes: UNF:6:lmMhTQrKIaAlMvaIIN67yA==
volatile acidity
f26207 Location:	Summary Statistics: Valid 1143.0; Max. 1.58; StDev 0.17963319302252442; Min. 0.12; Mean 0.5313385826771654 Variable Format: numeric Notes: UNF:6:tFD1QilllC/7mIpl1duieA==
citric acid
f26207 Location:	Summary Statistics: Mean 0.26836395450568684; Valid 1143.0; Min. 0.0; StDev 0.19668585234821898; Max. 1.0; Variable Format: numeric Notes: UNF:6:bmB481nw8P3EZS1v0sWHGQ==
residual sugar
f26207 Location:	Summary Statistics: Mean 2.532152230971129; StDev 1.3559174666826788; Min. 0.9; Max. 15.5; Valid 1143.0 Variable Format: numeric Notes: UNF:6:XfN3EyjHSzBgjARggN3LIQ==
chlorides
f26207 Location:	Summary Statistics: StDev 0.047267337952380556; Min. 0.012; Valid 1143.0; Max. 0.611; Mean 0.0869326334208224 Variable Format: numeric Notes: UNF:6://KRUC5g29pLiQKMdPRXfw==
free sulfur dioxide
f26207 Location:	Summary Statistics: Mean 15.61548556430448; Max. 68.0; Valid 1143.0; StDev 10.250486123430814; Min. 1.0; Variable Format: numeric Notes: UNF:6:TzbpgwnFkwci7bml1euhcw==
total sulfur dioxide
f26207 Location:	Summary Statistics: Mean 45.91469816272971; Min. 6.0; Valid 1143.0; Max. 289.0; StDev 32.78213030734315; Variable Format: numeric Notes: UNF:6:EZwG2bzNpeI6HHEgx1W3bg==
density
f26207 Location:	Summary Statistics: Min. 0.9900700000000001; Max. 1.00369; Valid 1143.0; StDev 0.001925067130254572; Mean 0.9967304111986002; Variable Format: numeric Notes: UNF:6:ZpBPWZC2GNVvcopuZBDn0g==
pH
f26207 Location:	Summary Statistics: Min. 2.74; Mean 3.3110148731408575; StDev 0.15666405977275222; Valid 1143.0; Max. 4.01; Variable Format: numeric Notes: UNF:6:SMoM9ehelKiZnHhAHHs3aQ==
sulphates
f26207 Location:	Summary Statistics: Valid 1143.0; Max. 2.0; StDev 0.1703987144670742; Min. 0.33; Mean 0.6577077865266842 Variable Format: numeric Notes: UNF:6:uPq+1TTs1rsv0T03Jg5pOw==
alcohol
f26207 Location:	Summary Statistics: StDev 1.0821956098764436; Valid 1143.0; Min. 8.4; Mean 10.442111402741324; Max. 14.9; Variable Format: numeric Notes: UNF:6:pyJpMIG+Oi7ZtKbJ+2bi0A==
quality
f26207 Location:	Summary Statistics: Valid 1143.0; Max. 8.0; Min. 3.0; Mean 5.657042869641295; StDev 0.8058242481000936; Variable Format: numeric Notes: UNF:6:tNDm3LzlLAEHdxGM6WN0Mw==
Id
f26207 Location:	Summary Statistics: Mean 804.9693788276538; Min. 0.0; Max. 1597.0; StDev 463.997116295106; Valid 1143.0 Variable Format: numeric Notes: UNF:6:4SVNmbKV+Vmg7cF/3FxZ5Q==

Wine Quality Dataset (doi:10.5072/FK2/YKJQY8) (Wine Quality Prediction - Classification Prediction)

Wine Quality Dataset (doi:10.5072/FK2/YKJQY8)
(Wine Quality Prediction - Classification Prediction)