python The data set contains 1197 instances, each of which have 15 columns: the first 14 columns corresponding to the attributes, the 15th column ``actual_productivity'' is the variable that we will predict. The details of the data set can be found and downloaded in the original UCI Repository. https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees garments_worker_productivity.csv 1 Load and explore dataset, do necessary pre-processing and split the dataset into training set and test set with an appropriate ratio. Explain the steps that you have taken (e.g. show dataset size, dealing with missing values, feature exploration and representation, label distribution, split dataset etc). 2 Based on the training data, create three supervised machine learning (ML) models for predicting actual_productivity. Report performance score using a suitable metric on the test data. Is it possible that the presented result is an underfitted or overfitted one? Justify. Justify different design decisions for each ML model used to answer this question. Have you optimised any hyper-parameters for each ML model? What are they? Why have you done that? Explain. Finally, make a recommendation based on the reported results and justify it. 3 Analyse the importance of the features for predicting actual_productivity using two different approaches. Give statistical reasons of your findings.

icon
Related questions
Question

python

The data set contains 1197 instances, each of which have 15 columns: the first 14 columns corresponding to the attributes, the 15th column ``actual_productivity'' is the variable that we will predict. The details of the data set can be found and downloaded in the original UCI Repository.

https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees

garments_worker_productivity.csv

1

Load and explore dataset, do necessary pre-processing and split the dataset into training set and test set with an appropriate ratio. Explain the steps that you have taken (e.g. show dataset size, dealing with missing values, feature exploration and representation, label distribution, split dataset etc).

2

Based on the training data, create three supervised machine learning (ML) models for predicting actual_productivity.

Report performance score using a suitable metric on the test data. Is it possible that the presented result is an underfitted or overfitted one? Justify.

Justify different design decisions for each ML model used to answer this question.

Have you optimised any hyper-parameters for each ML model? What are they? Why have you done that? Explain.

Finally, make a recommendation based on the reported results and justify it.

3

Analyse the importance of the features for predicting actual_productivity using two different approaches. Give statistical reasons of your findings.

Expert Solution
steps

Step by step

Solved in 10 steps with 10 images

Blurred answer