Stat 113 Beiyi(Summer) Liu
Professor Ihsan Shahwan
Final Project Part C
In order to figure out how variables relates to each other and the connections among the variables, or one can predict the other. I will choose three quantitative variables or two quantitative variables and one categorical variable on each pairs. I will also use graphs of scatter plots; regression and correlation to understand that how one variable affect other two variables. There are six groups below:
Group one: High School Percentile (HSP), Cumulative GPA (GPA), and ACT Composition Score (COMP) a) HSP vs GPA
b) HSP vs COMP
c) COMP vs GPA
From graph a, we can find out that there is moderate
…show more content…
From graph b, there is weak positive liner relationship between CREDITS and GPA; the correlation is 0.106; the equation of regression is GPA=0.00141886*CREDITS+2.94831; the slope is 0.00141886 which is positive; when the predictor variable CREDITS increase, the response variable GPA also weakly increase; for example, when CREDITS increase by 1, the GPA will increase 0.00141886.
From graph c, there is a strong positive liner association between AGE and CREDITS; the correlation is 0.668; the equation of regression is CREDITS=11.7475*AGE-174.356; the slope is 11.7475 which is positive; when the predictor variable AGE increase, the response variable CREDITS also strongly increase; for instance, when AGE increase by 1, the CREDITS will increase 11.7475. There are some outliers may affect the correlation. Based on the graphs and data above, we can find out a student who is older with a litter lower GPA, but has very higher credits; the student with higher credits also has high GPA.
Group Four: ACT English Score (ENGLISH), ACT Composition Score (COMP) and Age (AGE) a) AGE vs ENGLISH
b) AGE vs COMP
c) ENGLISH vs COMP
From graph a, we can see that there is a weak negative liner relationship between AGE and English scores; the correlation is -0.042; the equation of regression is ENGLISH=-0.0814809*AGE+24.469; the slope is -0.0814809 which is negative; when the predictor
There is no clear relationship between the two variables in the scatter plot. The points are in no specific pattern, suggesting that there is no significant correlation between the variables years and credit balance.
A researcher found a significant relationship between a person's age, a, the number of hours a person works per week, b, and the number of accidents, y, the person has per year. The relationship can be represented by the multiple regression equation y = -3.2 + 0.012a + 0.23b. Predict the number of accidents per year (to the nearest whole number) for a person whose age is 42 and who works 46 hours per week.
A scatter plot diagram provides a graphical observation of how two different variables are related to one another. Looking at the data collected for credit balance of customers along with the data collected for income of customers, it’s easy to recognize that there is a correlation between the two variables. The linear positive slope indicates that an increase in the credit balance correlates with an increase in income.
This scatter plot graph is a representation of combining income and credit balance. It shows the income increasing as the credit balance increases. As a result of this data it can be inferred that there is a positive relationship between the two variables. Because of the positive relationship between income and credit balance the best fit line or linear regression line fits the data quite well. The speculation can be strongly made that the
The scatter plot of Credit balance ($) versus Size show that the slope of the „best fit‟ line is upward (positive);this indicates that Credit balance varies directly with Size. As Size increases, Credit Balance also increases vice versa. Correct
(TCO 3) Before performing linear regression, it is important to ensure that a linear relationship exists between the dependent and independent variables by plotting observed
As the slope becomes greater, the line will have a greater steepness when graphed. Lower slopes result in a line closer to being horizontal. Positive slopes display a line that increases its y values as x values increase while negative slopes display a line that decreases its y values as x values increase. Linear functions are used to display situations with a constant rate of change. This can include miles driven per hour or the cost of a service for an amount of time.
The R2 value is a statistical measurement of how close the data is from the linear regression line and indicates the variation in the values from the relationship of two variables. In graph 1, the R2 value is 0.60497, which indicates a moderate positive correlation. This suggests that around 60% of the variation in the mean age among different football teams can be predicted from the relationship between the number of wins and the mean age of the team. Contrarily, 40% of the variation in mean team ages cannot be
Model: In the model, we have 14 variables: “Education Year” (ed); “Parent’s Income” (incomehi); “Tuition of the school” (tuition); “Distance to the college” (dist); “State Hourly Wage in Manufacturing in 1980” (stwmfg80); “County Unemployment rate in 1980” (cue80); does the student live in an “Urban Area” (urban); if the student’s family owns home (ownhome); does the student’s mother or father, or both, possess a college degree (dadcoll, momcoll); if the student is Hispanic (hispanic), African American (black); and, if the student is female (female). After testing the significance level, we exclude three variables that are not statistically significant, which means they might have minor or even no impact on the test score.
Also the graph shows a negative correlation with a curve of best fit. This shows the relationship is not Ohmic as it is not a straight line.
The data in the above scatter plot shows that there is a correlation between the quiz results and the exam results. R² = 0.536, which indicates that about 54% of the variation in the average of the quiz is accounted for the linear relationship with the exam results. In other words, about 46% of the variation is not explained by the least-squares regression line.
The equation for the age was y=19,839-1,070.25x while the equation for the mileage was y=17,627.5-0.174646x. The y values were relatively the same by the x’s were not. This was because of what each x in the equation represented was different. Since the ages went down by 1, the x could be large because the explanatory variable was fairly small only going from 1-10. The x for the mileage was fairly small because the explanatory variable was fairly large. The explanatory variable for this one stretched from 836 mi to 58,530 mi meaning that the x had to be small to accommodate for its size. The standard deviation of the residuals gave headway for age to be a better predictor. This mathematically shows that age is a better predictor of a car’s price than mileage by a slim
From the scatterplots above we can see there is a moderate to weak linear relationship between the response factor and the independent variables. Because of this result, the model indicates that the data is unbiased and consistent in nature. The strongest of the scatterplots seem to be personal income vs unemployment rate and personal income vs college. But even then, the linear relationship is fairly weak. Because of the linear relationship, I continued to check correlations.
The points and line appear on different graphs due to the constraints of the software available. However, it can clearly be seen that the scattering of the data will make any line of best fit a poor predictor of trends and values. The calculated line of y = 1094.04x 0.11 is shown in blue on the relevant graph and does not appear to predict any trend in the data.
3. The slope of the linear regression line is 0.0647. This is shown in the equation of the line, on the right hand side of the chart.