Term Project Complete

docx

School

Northern Alberta Institute of Technology *

*We aren’t endorsed by this school

Course

2250

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

docx

Pages

7

Report

Uploaded by GrandPower2747

Term Project Prepared by Dong Ye Darren Lacusta 200485421 Jarvis Vachon 200452100 Prepared for Dong Ye CMIS2250 Section XXX Date of Submission: Sunday, March 31, 2024 Northern Alberta Institute of Technology
Explanation of the Process 20% ( Please document EVRYTHING you have done from data sourcing, data prep – exploration, & transformation, etc., model building, model evaluation, score new data.) The Start of the process was downloading the source Data and removing all win-like statistics, such as L, PL, PW, MOV, SRS, and SOS. Then we looked at what each piece of Data represented. Once the necessary points of Data were recognized we repeated the process for 4 more previous seasons. The next step was to prepare a classification model for the orange model builder. In the Source Data we categorized whether the team had made it to the playoffs or not, a (1) indicating they were successful in reaching the playoffs and a (0) if they were not. Next step was to prepare the subject data, this was similar to collecting the source data, except we only used to 2019-2020 season for the data. And because we wanted to predict the number of wins and whether a team would be successful in making it to the playoffs or not, we left those columns blank in the subject data. We also filled out the data dictionary through using the definitions given on basketball- references.com After both the subject and source data have been prepared, the next step would be to start building a model in orange. Ensuring that we took the source data and taught the different models the original source data, and then looked at the test and score results to help decide which models would most accurately predict the necessary data. After choosing which model would be best, we took the subject data and plugged it into the predictions function in orange, making sure to add a data table to visualize the results. Justification of Model Choice 25% (Please present the summary report of each model you have built; evaluate the models and recommend one model for prediction.) Predictive Model, Random Forest With the predictive model we tested 5 different models, tree, SVM, Random Forest, KNN and Linear Regression, after looking at testing different results with the Test and Score function on Orange, the Random Forest showed a high R2, despite Linear Regression had a larger R2, after seeing the MSE, RMSE and MAE for Random Forest were larger than they were for Linear Regression we chose to go with the predictions made by the Random Forest model. Dong Ye
Classification Model, Logistic Regression With the classification model we tested 5 different models, Naive Bayes, Logistic Regression, KNN, Random Forest, and Neural Network. The reason we chose Logistic Regression is because after looking at the evaluation results, it had the highest rating in both precision and recall, slightly topping Random Forest. Precision and recall are extremely evaluation metrics because precision refers to percentage of the Dong Ye
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
results that are relevant, and recall refers to the percentage of total relevant results classified by the algorithm, therefore it’s important to choose the model with ranked highest in those metrics. Dong Ye
Presentation of Results & Accuracy of the Models 10% +10% (Please present the prediction result in a table format. And compare the results with the factual data. ) Predictive Model, Team Forecasting Results Results Error Milwaukee Bucks 59 56 59-56=3 Los Angeles Clippers 51 49 51-49=2 Los Angeles Lakers 52 52 52-52=0 Toronto Raptors 53 53 53-53=0 Boston Celtics 51 48 51-48=3 Dallas Mavericks 47 43 47-43=4 Houston Rockets 46 44 46-44=2 Miami Heat 47 44 47-44=3 Utah Jazz 46 44 46-44=2 Denver Nuggets 45 46 45-56=-1 Oklahoma City Thunder 45 44 45-44=1 Philadelphia 76ers 46 43 46-43=3 Indiana Pacers 45 45 45-45=0 Phoenix Suns 41 34 41-34=7 New Orleans Pelicans 38 30 38-30=8 Portland Trail Blazers 41 35 41-35=6 San Antonio Spurs 41 32 41-32=9 Memphis Grizzlies 41 34 41-34=7 Orlando Magic 41 33 41-33=8 Brooklyn Nets 41 35 41-35=6 Sacramento Kings 37 31 37-31=6 Chicago Bulls 35 22 35-22=13 Minnesota Timberwolves 34 19 34-19=15 Detroit Pistons 33 20 33-20=13 Washington Wizards 34 25 34-25=9 New York Knicks 32 21 32-21=11 Charlotte Hornets 32 23 32-23=9 Atlanta Hawks 29 20 29-20=9 Cleveland Cavaliers 30 19 30-19=11 Golden State Warriors 29 15 29-15=14 Dong Ye
Classification Model Team Forecasting Result Results Error Milwaukee Bucks 1 1 NO Los Angeles Clippers 1 1 NO Los Angeles Lakers 1 1 NO Toronto Raptors 1 1 NO Boston Celtics 1 1 NO Dallas Mavericks 0 1 YES Houston Rockets 1 1 NO Miami Heat 1 1 NO Utah Jazz 1 1 NO Denver Nuggets 1 1 NO Oklahoma City Thunder 1 1 NO Philadelphia 76ers 1 1 NO Indiana Pacers 1 1 NO Phoenix Suns 1 1 NO New Orleans Pelicans 1 1 NO Portland Trail Blazers 0 1 YES San Antonio Spurs 1 1 NO Memphis Grizzlies 1 1 NO Orlando Magic 1 1 NO Brooklyn Nets 1 1 NO Sacramento Kings 1 0 YES Chicago Bulls 1 0 YES Minnesota Timberwolves 1 0 YES Detroit Pistons 1 0 YES Washington Wizards 1 0 YES New York Knicks 1 0 YES Charlotte Hornets 1 0 YES Atlanta Hawks 1 0 YES Cleveland Cavaliers 1 0 YES Golden State Warriors 1 0 YES MSE Possible Improvements 5% +Quality of Presentation 15% (Please recommend some- not one – improvements, and provide rationale.) Dong Ye
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
While most of the top-ranking teams were accurate, the prediction became less accurate moving down the rankings. A possible improvement could have been adding more data, such as more previous seasons. Another possible improvement would be to increase the number of models we were using. With more models we may have found one that produced higher results in the test and score evaluation results. Dong Ye