Term Project Complete
docx
School
Northern Alberta Institute of Technology *
*We aren’t endorsed by this school
Course
2250
Subject
Industrial Engineering
Date
Apr 3, 2024
Type
docx
Pages
7
Uploaded by GrandPower2747
Term Project
Prepared by Dong Ye
Darren Lacusta 200485421 Jarvis Vachon 200452100
Prepared for Dong Ye
CMIS2250 Section XXX
Date of Submission: Sunday, March 31, 2024
Northern Alberta Institute of Technology
Explanation of the Process 20%
(
Please document EVRYTHING you have done from data sourcing, data prep – exploration, & transformation, etc., model building, model evaluation, score new data.)
The Start of the process was downloading the source Data and removing all win-like statistics, such as L, PL, PW, MOV, SRS, and SOS. Then we looked at what each piece of Data represented. Once the necessary points of Data were recognized we repeated the process for 4 more previous seasons. The next step was to prepare a classification model for the orange model builder. In the Source Data we categorized whether the team had made it to the playoffs or not, a (1) indicating they were successful in reaching the playoffs and a (0) if they were not. Next step was to prepare the subject data, this was similar to collecting the source data, except we only used to 2019-2020 season for the data. And because we wanted to predict the number of wins and whether a team would be successful in making it to the playoffs or not, we left those columns blank in the subject data. We also filled out the data dictionary through using the definitions given on basketball-
references.com
After both the subject and source data have been prepared, the next step would be to start building a model in orange. Ensuring that we took the source data and taught the different models the original source data, and then looked at the test and score results to help decide which models would most accurately predict the necessary data. After choosing which model would be best, we took the subject data and plugged it into the predictions function in orange, making sure to add a data table to visualize the results.
Justification of Model Choice 25%
(Please present the summary report of each model you have built; evaluate the models and recommend one model for prediction.)
Predictive Model, Random Forest
With the predictive model we tested 5 different models, tree, SVM, Random Forest, KNN and Linear Regression, after looking at testing different results with the Test and Score function on Orange, the Random Forest showed a high R2, despite Linear Regression had a larger R2, after seeing the MSE, RMSE and MAE for Random Forest were larger than they were for Linear Regression we chose to go with the predictions made by the Random Forest model. Dong Ye
Classification Model, Logistic Regression
With the classification model we tested 5 different models, Naive Bayes, Logistic Regression, KNN, Random Forest, and Neural Network. The reason we chose Logistic Regression is because after looking at the evaluation results, it had the highest rating in both precision and recall, slightly topping Random Forest. Precision and recall are extremely evaluation metrics because precision refers to percentage of the Dong Ye
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
results that are relevant, and recall refers to the percentage of total relevant results classified by the algorithm, therefore it’s important to choose the model with ranked highest in those metrics.
Dong Ye
Presentation of Results & Accuracy of the Models 10% +10%
(Please present the prediction result in a table format. And compare the results with the factual data. )
Predictive Model,
Team
Forecasting Results
Results
Error
Milwaukee Bucks
59
56
59-56=3
Los Angeles Clippers
51
49
51-49=2
Los Angeles Lakers
52
52
52-52=0
Toronto Raptors
53
53
53-53=0
Boston Celtics
51
48
51-48=3
Dallas Mavericks
47
43
47-43=4
Houston Rockets
46
44
46-44=2
Miami Heat
47
44
47-44=3
Utah Jazz
46
44
46-44=2
Denver Nuggets
45
46
45-56=-1
Oklahoma City Thunder
45
44
45-44=1
Philadelphia 76ers
46
43
46-43=3
Indiana Pacers
45
45
45-45=0
Phoenix Suns
41
34
41-34=7
New Orleans Pelicans
38
30
38-30=8
Portland Trail Blazers
41
35
41-35=6
San Antonio Spurs
41
32
41-32=9
Memphis Grizzlies
41
34
41-34=7
Orlando Magic
41
33
41-33=8
Brooklyn Nets
41
35
41-35=6
Sacramento Kings
37
31
37-31=6
Chicago Bulls
35
22
35-22=13
Minnesota Timberwolves
34
19
34-19=15
Detroit Pistons
33
20
33-20=13
Washington Wizards
34
25
34-25=9
New York Knicks
32
21
32-21=11
Charlotte Hornets
32
23
32-23=9
Atlanta Hawks
29
20
29-20=9
Cleveland Cavaliers
30
19
30-19=11
Golden State Warriors
29
15
29-15=14
Dong Ye
Classification Model
Team
Forecasting Result
Results
Error
Milwaukee Bucks
1
1
NO
Los Angeles Clippers
1
1
NO
Los Angeles Lakers
1
1
NO
Toronto Raptors
1
1
NO
Boston Celtics
1
1
NO
Dallas Mavericks
0
1
YES
Houston Rockets
1
1
NO
Miami Heat
1
1
NO
Utah Jazz
1
1
NO
Denver Nuggets
1
1
NO
Oklahoma City Thunder
1
1
NO
Philadelphia 76ers
1
1
NO
Indiana Pacers
1
1
NO
Phoenix Suns
1
1
NO
New Orleans Pelicans
1
1
NO
Portland Trail Blazers
0
1
YES
San Antonio Spurs
1
1
NO
Memphis Grizzlies
1
1
NO
Orlando Magic
1
1
NO
Brooklyn Nets
1
1
NO
Sacramento Kings
1
0
YES
Chicago Bulls
1
0
YES
Minnesota Timberwolves
1
0
YES
Detroit Pistons
1
0
YES
Washington Wizards
1
0
YES
New York Knicks
1
0
YES
Charlotte Hornets
1
0
YES
Atlanta Hawks
1
0
YES
Cleveland Cavaliers
1
0
YES
Golden State Warriors
1
0
YES
MSE
Possible Improvements 5% +Quality of Presentation 15%
(Please recommend some- not one – improvements, and provide rationale.) Dong Ye
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
While most of the top-ranking teams were accurate, the prediction became less accurate moving down the
rankings. A possible improvement could have been adding more data, such as more previous seasons. Another possible improvement would be to increase the number of models we were using. With more models we may have found one that produced higher results in the test and score evaluation results. Dong Ye