MAT 243 Project Three Summary Report

docx

School

Metropolitan Community College, Omaha *

*We aren’t endorsed by this school

Course

243

Subject

Mathematics

Date

Feb 20, 2024

Type

docx

Pages

8

Report

Uploaded by MrDickButtkiss

MAT 243 Project Three Summary Report [name] [email] Southern New Hampshire University
1.Introduction The overall purpose of this report is to predict the number of wins for a team in a season by analyzing their historical performance metrics. Management, as well as the team coach, have requested regression models be utilized to predict the total number of wins for a team in a regular season based on key performance metrics in order to make key decisions towards improving the performance of the team. This report will utilize the FiveThirtyEight NBA Elo dataset acquired from Kaggle which has been aggregated to study the total number of wins in a regular season based on previous performance metrics. These metrics will be used to create a multiple regression model to predict potential wins and also be used for hypothesis testing. The hypothesis testing will determine whether a variable has a relationship with total wins, the strength of that relationship, and if there is any correlation to the total number of wins in a season. 2. Data Preparation There are several variables utilized in this report such as total wins, average points, average relative skill, average point differential and the average relative skill differential. All of these variables are relative to the average in a given season with three predictor values having great significance. The avg_pts_differential represents the differential in the average amount of points scored in a regular season between the team and their opponents. The outcome, whether negative or positive, depends on whether the team has a lower (negative) or a higher (positive) scoring average compared to their opponent and what the overall point separation is between their opponent (how much better or worse their scoring is comparatively). The avg_elo_n represents the average relative skill of each team in a regular season. This number is used as a baseline to measure the skill of the team compared to the entire league. Having an equivalent relative skill to the league average indicates you are an average team, whereas if it is lower, the team is below average, and higher, the team is considered to have above average skill in the league. The avg_elo_differential further assesses the skill level by comparing the relative skill between the team and a specific opponent rather than the league. This differential indicates how the teams compare in skill level to one another and how much better, or worse, one is from the other depending on how high the differential is. 3. Simple Linear Regression: Scatterplot and Correlation for the Total Number of Wins and Average Relative Skill Data visualization techniques are generally used to study the relationship trends between two variables by displaying the negative and positive correlations of the variables as well as the
strength of these trends as well. Data visualization is used to portray specific patterns, if there are any, as well as linear relationships between the variables. These techniques can be used to analyze normality and distribution, variances and variable relationships as well as establish trends through the use of a regression line. The correlation coefficient is used to determine the strength and direction of the association between two variables by determining if it is positive or negative. This determination is understood through the correlation coefficient being a number between -1 and 1, which reveals whether it is a negative correlation (negative number) or positive correlation (positive number) and the strength of that correlation indicated by how close to -1 or 1 that number is. The closer it is to -1 or 1, the stronger the correlation is in the respective direction. While a negative correlation indicates that as one variable increases, the other decreases, a positive correlation tells us that as on variable increases, the other increases as well. The scatterplot and Pearson correlation coefficient indicate that as the average relative skill of a team increases, the total number of wins for that team in a regular season increases as well. The scatterplot displays a strong positive correlation between these two variables, indicated by a Pearson correlation coefficient of .9072. Considering that the coefficient needs to be between .8 and 1.0 to be considered a strong correlation, the resulting coefficient of .9072 is well within the respective range and indicates that a strong positive correlation between these two variables is accurate. Considering that the P-value equates to 0.0, the correlation coefficient is statistically significant. This is further indicated by the P-value being less than the level of significance of .01 (1% level of significance). 4. Simple Linear Regression: Predicting the Total Number of Wins using Average Relative Skill Generally, a simple linear regression model is used to show the linear relationship between the response variable and predictor variable using a regression line. This line indicates
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
whether the relationship is positive or negative and also visually displays how strong this relationship is. The equation for this model is: Y= β 0 + β 1 X + ε Y = -128.2475 + 0.1121( ) The equations for the hypotheses tested are: Null Hypothesis = H 0 : β 1 = 0 Alternative Hypothesis = H 0 : β 1 0 Level of Significance = .05 (5 percent) The null hypothesis tested states that there is no association between the average relative skill and total number of wins or that the skill level does not indicate how many wins a team will have. However, that alternative hypothesis considers that there is an association between the average relative skill and the total number of wins or that the average skill level does indicate how many wins a team will have. Although there is no level of significance given, an alpha of .05 will be utilized (5 percent level of significance) indicating a confidence interval of 95 percent. Table 1: Hypothesis Test for the Overall F-Test Statistic Value Test Statistic 2865 P-value 0.00 As a result of the hypothesis tests there is insufficient evidence to support the null hypothesis since the P-Value of 0.00 is less that the level of significance of .05. Due to the rejection of the null hypothesis, the alternative hypothesis stands. The data supports the notion that the average relative skill is associated with the total number of wins or that the average skill level does indicate how many wins a team will have in a regular season. In conclusion, the average relative skill can predict the total number of wins in the regular season. When using the equation to predict a team with a relative skill of 1550, it is predicted that the total number of wins will be 45 in a regular season. Total wins = -128.2475 + 0.1121(1550) Total wins = 45 Alternatively, when using the equation to predict a team with a relative skill of 1450, it is predicted that the total number of wins will be 34 in a regular season. Total wins = -128.2475 + 0.1121(1450) Total wins = 34
5. Multiple Regression: Scatterplot and Correlation for the Total Number of Wins and Average Points Scored The scatterplot and Pearson correlation coefficient indicates that the number of wins does not share a strong relationship with the number of wins as it is more moderate. This is understood through the Pearson correlation coefficient being .47, which falls into the moderate strength range of .4 to .8, particularly on the lower end of it in relation the overall strength of said relationship. This is displayed through a variety of ranges within the scatterplot regarding the number of wins to average points scored. Considering that the P-value equates to 0.0, the correlation coefficient is statistically significant. This is further indicated by the P-value being less than the level of significance of .01 (1% level of significance). 6. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored and Average Relative Skill A multiple linear regression model is generally used to predict the response variable using multiple predictor variables by indicating the linear relationship between the response variable and the multiple predictor variables respectively. This relationship is further observed by 4 assumptions; the mean of zero, independence, normality and constant variance. The equation for this model is: Y = β 0 + β 1 X 1 + β 2 X 2 Y = -152.5736 + 0.3497(X 1 ) + 0.1055(X 2 ) The equations for the hypotheses tested are: Null Hypothesis = H o : β 1 = β 2 =...= β n = 0 Alternative Hypothesis = H a : At least one β i ≠ 0 for i = 1 , . . . n Level of Significance = .05 (5 percent) The null hypothesis tested states that there is no relationship between the average relative skill or average points (predictor variables) and the total number of wins (response variable).
However, the alternative hypothesis considers that there is a relationship between the total number of wins (response variable) and the average relative skill and/or average points (one or more of the predictor variables). Although there is no level of significance given, an alpha of .05 will be utilized (5 percent level of significance) indicating a confidence interval of 95 percent. Table 2: Hypothesis Test for the Overall F-Test Statistic Value Test Statistic 1580 P-value 0.00 Considering that the P-value equates to 0.0, the correlation coefficient is statistically significant. This is further indicated by the P-value being less than the level of significance of .01 (1% level of significance). Therefore, there is insufficient evidence to support the null hypothesis and it must be rejected in favor of the alternative hypothesis. The data ultimately shows that both variables, the average points and the average relative skill, have a relationship to the total number of wins. Each variable is found to be statistically significant with the 1 percent level of significance as both P-values are less than that of the alpha at .01 respectively with both predictor variables, average points and average relative skill, having a P-value of 0.00. The coefficient of determination equates to 0.837 and since it is relatively close to 1.0 (within the range of .8 to 1.0) the data indicates that there is a strong correlation between the predictor variables and the response variable mentioned above. Considering a team with a relative skill of 1350 that averages 75 points per game, the predicted total wins is 16 in a regular season. Total Wins = -152.5736 + 0.3497(75) + 0.1055(1350) Considering a team with a relative skill of 1600 that averages 100 points per game, the predicted total wins is 51 in a regular season. Total Wins = -152.5736 + 0.3497(100) + 0.1055(1600) 7. Multiple Regression: Predicting the Total Number of Wins using Average Points Scored, Average Relative Skill, Average Points Differential, and Average Relative Skill Differential A multiple linear regression model is generally used to predict the response variable by using multiple predictor variables to establish statistical calculations of the slope values for each of the predictor variables utilized. These statistical calculations are viewed as the relative contribution that each predictor variable has towards the total variance in the multiple regression model or how each variable contributes to the relationship towards the response variable.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Total wins = 34.5753 + 0.2597 (X 1 ) + -0.0134 (X 2 ) + 1.6206 (X 3 ) + 0.0525 (X 4 ) The equations for the hypotheses tested are: Null Hypothesis = H o : β 1 = β 2 =...= β n = 0 Alternative Hypothesis = H a : At least one β i ≠ 0 for i = 1 , . . . n Level of Significance = .05 (5 percent) The null hypothesis tested states that there is no relationship between the average relative skill, average points, average point differential or average relative skill differential (predictor variables) and the total number of wins (response variable). However, the alternative hypothesis considers that there is a relationship between the total number of wins (response variable) and the average relative skill, average points, average point differential or average relative skill differential (one or more of the predictor variables). Although there is no level of significance given, an alpha of .05 will be utilized (5 percent level of significance) indicating a confidence interval of 95 percent. Table 3: Hypothesis Test for Overall F-Test Statistic Value Test Statistic 1102 P-value 0.00 Considering that the P-value equates to 0.0, the correlation coefficient is statistically significant. This is further indicated by the P-value being less than the level of significance of .05 (5% level of significance). Therefore, there is insufficient evidence to support the null hypothesis and it must be rejected in favor of the alternative hypothesis. The data ultimately shows that at least one predictor variable, the average relative skill, average points, average point differential or average relative skill differential, has a relationship to the total number of wins. When observing the results of the individual t-test for each predictor variable, several, but not all predictor variables, were found to be statistically significant in regard to a 1 percent level of significance. The avg_elo_n (average relative skill) however, has a P-value of 0.442, which is greater than the level of significance of .01, indicating that in relation to the average relative skill there is no significant linear relationship between the total number of wins in a season and that the null hypothesis stands. However, there was a linear relationship found between the other predictor variables (average points, average point differential and average relative skill differential) and the total number of wins in a season, with these predictor variables having P- values less than .01. The coefficient of correlation equates to 0.878 and since it is relatively close to 1.0 (within the range of .8 to 1.0) the data indicates that there is a strong positive correlation and supports an overall 87.8% accuracy when making predictions with this model.
Considering a team with a relative skill level of 1350 that average 75 points per game and an average point differential of -5 alongside an average relative skill differential of – 30, the predicted total wins is 26 in a regular season Total wins = 34.5753 + 0.2597 (75) + -0.0134 ( 1350 ) + 1.6206 (−5 ) + 0.0525( −30) Considering a team with a relative skill level of 1600 that average 100 points per game and an average point differential of 5 alongside an average relative skill differential of 95, the predicted total wins is 52 in a regular season. Total wins = 34.5753 + 0.2597 ( 100 ) + -0.0134 ( 1600 ) + 1.6206 (5) + 0.0525 (95) 8. Conclusion The purpose of this analysis was to determine which of these variables, if any, would be best to focus on improving to ultimately increase the total wins in a season. Overall, based on the data analyzed throughout and the established multiple linear regression model, several variables have been found to have a relationship with the total wins in a regular season. With 87.8 percent accuracy, this multiple regression model can utilize the predictor variables of average points, average point differential and average relative skill differential to accurately predict a team’s total wins in a regular season. Ultimately, a positive trend in any of these variables will likely show an increase in the number of game won, enabling management to better asses these specific variables for the best overall improvement regarding the team. The more positive an increase in one or more of these variables, the greater the increase in total games won in a season and the more significant the results will be for the team.