Module05_R_Practice

.docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6010

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

6

Uploaded by HighnessAtomTrout39

Report
Name: Labhini Patel Course: ALY6010 Prob Theory and Intro Stats Faculty: Proff. Mykhaylo Trubskyy Date: 25 th October 2023
INTRODUCTION This report presents an analytical investigation into the dataset. The study was focused on understanding the relationships between different variables by employing both correlation and regression techniques. ANALYSIS 1. Correlation Analysis Correlation charts are kept limited to 5 variables for reporting due to two main reasons: clarity and relevance. Clarity ensures that the audience can quickly discern relationships without being overwhelmed, and relevance mandates that only the most significant variables be presented to prevent data dilution.
Observations: 1. tree_dbh and stump_diam: The correlation coefficient is -0.17. This suggests a weak negative relationship between tree diameter at breast height and stump diameter. In practical terms, this might mean that trees with larger diameters at breast height tend to have slightly smaller stumps, although the relationship is weak and may not be of significant practical importance. 2. tree_dbh and borocode: The correlation value is 0.09, which indicates a very weak positive correlation. This implies that there's a slight tendency for trees with larger diameters at breast height to be associated with higher borough codes, but the relationship is minimal. 3. tree_dbh and st_assem: With a correlation of -0.14, there's a weak negative relationship. As the assembly district code increases, the tree diameter at breast height marginally decreases. 4. stump_diam and borocode: The coefficient of 0.02 suggests almost no linear relationship between stump diameter and borough codes. 5. stump_diam and st_assem: A coefficient of -0.05 indicates a very weak negative correlation, suggesting that stump diameter and assembly district code are largely independent, with a slight tendency for stump diameter to decrease as assembly district code increases. 6. borocode and st_assem: With a correlation coefficient of -0.53, there's a moderate negative relationship between borough codes and assembly district codes. This is the strongest relationship observed in the heatmap and suggests that as borough codes increase, assembly district codes tend to decrease. 2. Regression Analysis For our analysis, we chose tree_dbh as the outcome variable with stump_diam, borocode and st_assem as predictor variables. Differences Between Correlation and Regression: Correlation analysis gauges the strength and direction of a linear relationship between two variables, while regression predicts one variable based on another. Correlation is a mutual relationship, meaning both variables vary with respect to each other. In contrast, regression identifies how the dependent variable changes when the independent variable(s) change.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Observations: From the regression summary table: Dependent Variable: The dependent variable is tree_dbh, which stands for tree diameter atbreast height. Independent Variables: The predictors are stump_diam (stump diameter), borocode (borough codes), and st_assem. 1. Intercept: The intercept value is approximately 14.153. This value represents the estimated mean tree_dbh when all predictors are held at zero, although it might not have a direct practical interpretation given the nature of the predictors. 2. stump_diam: For every unit increase in stump diameter, the tree diameter at breast height is estimated to decrease by about 0.469 units, holding all other predictors constant. This relationship is statistically significant with a very low p-value. 3. borocode: A unit increase in borough code corresponds to an estimated 0.179 increase in tree diameter at breast height, keeping all other variables constant. This relationship is also statistically significant. 4. st_assem: For every unit increase in the assembly district code, tree_dbh decreases by an estimated 0.064 units, with all other predictors being constant. This predictor, too, is statistically significant. Model Fit: Multiple R-squared: This value (0.05258) represents the proportion of variance in the dependent variable (tree_dbh) that's explained by the independent variables. Approximately 5.3% of the variability in tree diameter at breast height can be explained by the predictors in this model.
Adjusted R-squared: This value (0.05258) is a modified version of R-squared that has been adjusted for the number of predictors in the model. It's essentially the same as the Multiple R-squared in this context, suggesting no penalty for adding predictors. F-statistic: The F-statistic tests the hypothesis that all regression coefficients are equal to zero versus at least one is not. The very low p-value (less than 2.2e-16) associated with this F-statistic suggests that the model with predictors fits significantly better than a model with no predictors. Conclusion from Regression Analysis: The regression model demonstrates statistically significant relationships between the tree diameter at breast height and the predictors. However, while the relationships are statistically significant, the R-squared value indicates that a relatively small proportion of the variability in tree diameter at breast height is explained by these predictors. It's also worth noting that the negative coefficients for stump_diam and st_assem might warrant further investigation, especially considering one might expect a tree with a larger stump diameter to also have a larger diameter at breast height. This counterintuitive result could be due to other confounding factors not considered in the model or specific characteristics of the dataset. CONCLUSION Our study employed correlation and regression analyses on a dataset to discern relationships between selected variables. The correlation matrix revealed inter-variable relationships, notably the inverse correlation between st_assem and borocode. However, correlation doesn't establish causation. The regression analysis, using tree diameter at breast height (tree_dbh) as the dependent variable, found statistically significant relationships with predictors like stump_diam. Surprisingly, the data indicated a negative relationship between stump_diam and tree_dbh, warranting further investigation. Although significant, the predictors explained a limited variance in tree_dbh. This study underscores that while statistical tools provide valuable insights, comprehensive understanding requires combining these methods with domain expertise.
R Script
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help