Exploring U.S. Census Data: Insights from the Current Population Survey

FINAL REPORT ALY 6015 Intermediate Analytics Hardh Patel Date: December 4 th , 2023 Instructor: Sergiy Shevchenko

INTRODUCTION The U.S. Census Bureau carries out a monthly inquiry known as the Current Population Survey (CPS), gathering data on a plethora of demographic and economic characteristics that influence the American populace. This trove of information from the CPS is instrumental in shedding light on the nation's social and economic dynamics, serving as an indispensable tool for both decision-makers and scholars. This preliminary assessment aims to lay out our early observations and offer a detailed examination of the CPS data collection. Our objective is to deepen our comprehension of the prevailing conditions affecting individuals in the U.S. by exploring a variety of demographic and economic elements, including but not limited to age, sex, educational background, earnings, and employment circumstances. In addition, our examination will delve into various population cohorts to pinpoint any imbalances and monitor evolution over periods. Our present scrutiny is anchored in the data gleaned from the CPS for November 2022, encapsulating details on upwards of 123,000 individuals. This dataset encapsulates an array of demographic and economic attributes, spanning age, sex, ethnicity, educational achievements, financial status, professional classifications, and sectoral engagement. To dissect the data, we employed exploratory methods, scrutinizing the likelihood distributions and condensed metrics for the assorted variables. To further elucidate our findings, we crafted graphical representations such as histograms and point diagrams, which serve to underscore recurring patterns and trajectories within the dataset. EXPLORATORY DATA ANALYSIS DESCRIPTION Employing descriptive statistical methods has underscored the importance of refining the data and deepening our understanding of the involved metrics. Within this dataset, we found 123,009 entries across 388 distinct variables. Our initial step in extracting pertinent conclusions and insights was to purify the dataset. This process entailed the elimination of incomplete entries and the enhancement of the variables at hand. For instance, employment status was subdivided into several groups, including those who are retired, employed, or unable to work. Furthermore, we distinguished variables relating to geographic region and cultural background to facilitate a more granular examination and utilized summary tables in our exploratory data analysis. The income range

for families was quantified by assigning a random number within the specified range for each entry. The variable pertaining to educational attainment was also scrutinized, and a new category was established to delineate the various educational qualifications observed. Upon the completion of the data purification phase, we will proceed to the analytical segment of this document, wherein we intend to meticulously analyze the survey data. Our goal is to present an exhaustive narrative of the data gathering methodology, spotlighting any significant patterns and tendencies within the dataset. We will apply statistical indicators such as the mean, median, and mode to discern the data distribution and evaluate the central tendencies of the numerical variables. This rigorous analysis is a crucial element of our endeavor, as it will facilitate the extraction of meaningful conclusions and insights from the data. To better understand the disparity and interconnections among different metrics, we segmented the data into various subgroups. This strategy was instrumental in generating significant insights.. SUBSET 1: Region, Gender, and Metropolitan Status Table 1: Descriptive summary on distribution of gender and metropolitan status . Midwest Northeast South West n = 19621 n = 15867 n = 36936 n = 27313 Gender Female 9,901 (50.5%) 8,173 (51.5%) 19,260 (52.1%) 13,765 (50.4%) Male 9,720 (49.5%) 7,694 (48.5%) 17,676 (47.9%) 13,548 (49.6%) Metropolitan Status Metropolitan 14,713 (75%) 13,547 (85.4%) 30,034 (81.3%) 22,119 (81%) Non - Metropolitan 4,908 (25%) 2,264 (14.3%) 6,355 (17.2%) 4,766 (17.4%) Not Identified 0 (0%) 56 (0.4%) 547 (1.5%) 428 (1.6%) Table 1 illustrates the distribution of participants by sex and urban categorization within four major regions: the Midwest, Northeast, South, and West. The data indicates a higher count of

Your preview ends here