HWK1_SPSS_EDA

.pdf

School

DePaul University *

*We aren’t endorsed by this school

Course

403

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

5

Uploaded by AmbassadorHorsePerson1079

Report
IT 223 HOMEWORK 1 EXPLORATORY DATA A N ALYSIS I N SPSS Total: 30 Points The goal of this assignment is to guide you through the exploratory analysis of a dataset using SPSS. You should do the following exercise in SPSS and submit the results of part 5 as explained at the end of the exercise. This problem should be completed after doing the reading assignments, the practice exercises and viewing the SPSS video tutorials on exploratory data analysis. Data description The data set gss2004.xls contains 575 observations on 5 variables: SEX = Respondent's sex – {1 for Male, 2 for Female} AGE = Age of respondent WWWHR = Hours on the WWW per week for Internet users NEWS30 = Respondent has used news site in the past 30 days (1= “never”, 2 = “1-2 times”, 3=” 3-5times” 4=”more than 5 times”) EMAILHR = Hours of e-mail per week for Internet users The data were collected from the 2004 General Social Survey for adult respondents (18 years of age or older), living in the United States. The GSS is one of the largest and longest projects that have been conducted to monitor social change and the growing complexity of American society (see http:// www.norc.org for more information). The analysis described below will study the number of hours spent by Internet users using email. The study will also explore whether men and women use email differently. PART 0: Download the data on your harddrive: 1) Login to the course website at http:\\d2l.depaul.edu 2) Go to Segment 1 (select Content on the top navigation bar) 3) Click on the Datasets link on the left navigation bar of Segment 1 and download the Excel file gss2004.xls .on your computer. 4) Open the SPSS program (If you are running SPSS on the CDM terminals, apply the steps above from the terminal server you are logged on) PART 1: Import an Excel file in SPSS 1) Click on File > Open > Data… under the top menu in SPSS. A dialog box to select files will pop up. 2) Go to the folder where you saved the data file gss2004.xls , and select it. You need to search for “xls” data files in the “files of type” box 3) Click OK. 4) The data should now appear in a SPSS data worksheet. Save it in a .sav file using the SAVE AS… option. 5) If data are successfully imported in SPSS, you should have 5 columns of data, which are the variables described above. PART 2: Define variables properties in the Variable View 1) Click on the Variable View of the SPSS data editor. This view will enable you to specify properties of
the variables in the dataset, such as change type, add label, etc... 2) Type in meaningful labels for each variable under the Label column (you can use the labels specified above). This step helps you remember what the variables are about. 3) Add value labels under the Value column for SEX and NEWS30. This will help you remember what the codes {1,2,…} denote. The labels will be used in the SPSS output. 4) Select the correct Measure for each variable. Remember that an ordinal variable has values that can be ranked (e.g. preferences); a nominal variable has values with no ranking (e.g gender) and a scale variable is a measurement variable that takes numeric values (e.g. salary). PART 3: Creating a Histogram Create a histogram of WWWHR: the hours per week spent on the WWW for Internet users. 1) Select Graphs > Legacy Dialogs > Histogram… under the top bar menu 2) The Histogram dialog box will appear. Select the variable to be analyzed (WWWHR) and click on the “>” arrow button to move the variable into the “Variable” box. 3) Click on “Titles…” button to add a title to the graph. Just use your intuition to navigate the other screens. 4) Click OK 5) The following histogram should appear 6) Double Click on the histogram chart in the Output window and the Chart Editor should appear. 7) To display only positive values on the Xaxis since WWHR > 0, click on the X-axis (or go to “Edit > Select X axis” on the chart editor menu). Select the “Scale” tab and change minimum value to 0. Then click OK and close the chart editor. 8) To change the histogram intervals, double click on the histogram bars and select the Binning tab. Check “Custom” and change the interval width to a small number. How does the histogram change? 9) Try now a large interval width. What happens?
PART 4: Compute descriptive statistics for number of hours for email (EMAILHR) METHOD 1: Simple procedure to compute a few descriptive statistics 1) Select Analyze > Descriptive Statistics > Descriptive… 2) Choose the variables to analyze 3) Click on the Options button to select the statistics 4) Click OK METHOD 2: BETTER! More statistics – lots of information! 1) Select Analyze > Descriptive Statistics > Explore 2) Choose the variables to analyze and move them to the “Dependent list” box. 3) Click on the Statistics button and check Percentiles. 4) Click on the Plots buttons and check histogram to create a histogram, and uncheck Stem& Leaf plot. Use both functions and compare the results. PART 5: TO BE SUBMITTED - Compute the descriptive statistics for email time (EMAILHR) by sex of respondents 1) Select Analyze>Descriptive Statistics > Explore… 2) Move the variable EMAILHR into the Dependent List box 3) Move the SEX variable into the Factor List box 4) Click on Statistics to select the statistics to compute 5) Click on Plots… and select “Factor levels together” under BoxPlots and check Histogram, and Normality Plots with tests. 6) Click Continue 7) Click OK in the “Explore” box
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A N SWER THE FOLLOWI N G QUESTIO N : Do men spend more time writing/reading emails than women? Compute the following summary statistics (mean, standard deviation, first quartile, median, third quartile, max and min from SPSS) for the number of weekly hours that men and women spend on emails. Write the statistics in the table below. Gender Variables Mean St.Dev. Max Min Male EMAILHR 6.33 9.506 50 0 Female EMAILHR 5.93 8.884 50 0 Gender Variables Median First quartile Third quartile Male EMAILHR 2 1 8 Female EMAILHR 2 1 7 Analyze the descriptive statistics and graphs for men and women. Do you see any difference in the amount of time that men and women spend on email? 1) Describe the shape, center and spread of the distribution and explain in plain English. The amount of emailing time for male and women is right skewed. This means that majority of people spend about 10 hrs on email per week, however, we have very few people spend up to 50 hrs per week. The median shows that 50%(Q2) of the people spend 2 hrs on emails. The Inter-Quartile Range (IQR =Q3-Q1), how spread the data is distributed. The IQR is 7 for men when compared this number to median we use that is very large, meaning that there is huge discrepancy among the email hrs. Comparing the min=0 and max=50 email times it became evident that the emailing time varies widely. 2) Based on the statistics values computed above and on the shape of the distribution, which statistics would you use to summarize the center of the data and why? Since the distribution is right skewed, I’ll use median to explain or interpret the center. 3) What does the five number tell us about the time spent on email (Hint, interpret the five number summary in plain English)
The five number summary shows 25% of the people will spend 1hr or less on email per week while 50% will spend 2hrs or less and the, 75% will spend 8hrs or less. The min amount of time spend by both male and female is 0hr (Less than 60mins) per week and the max time is spend is 50hrs per week. 4) What does the Boxplot and the normality test show? Explain. The graph shows the amount of writing reading emails for both men and women violates the normality test. The points are not aligned on the diagonal line and there are so many outliers. The box plot supports this finding, it looks like there are outliers after 20hrs of emailing per week for both male and female. o- outliers *- Far outliers 5) Use the 1.5xIQR rule to identify possible outliers. List the cutoff points for outliers, Show your workings. Explain what you found out. (Hint: Are there any excessive time spent on email for Male or Female or both). Male – Upper outlier =Q3 +(1.5* IQR) = 8+(1.5*7)=18.5( 18.5 hr and above are outliers) The calculation above shows that there is no lower outlier because its is negative and there is no negative time. It is however, unusual to spend more than 18hrs and 30 minutes in email reading or writing, meaning that the man uses more than this time on email is considered not typical Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*7)=-9.5, because this is negative and time is not negative. Female- Upper outlier =Q3 +(1.5* IQR) = 7+(1.5*6)=16 Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*6)=-8, because this is negative and time is not negative. The calculation above shows that there is no lower outlier because its is negative and there is no negative time. It is however, unusual to spend more than 16 hrs in email reading or writing, meaning that the man uses more than this time on email is considered not typical In general, the median shows the that both male and female spend the same amount of time on email ,i.e. , 2hrs per week and there max time use is 50hrs. However, results also shows on the extreme cut-off for male is hour more than female which doesn’t seem much of a difference. SUBMISSIO N I N STRUCTIO N S: Copy the SPSS relevant tables, graphs and your answer in a document. Bring a printed copy to class on due date and also submit it at the course webpage at http:// d2l.depaul.edu. Write your name on the document you submit. Keep a copy of all your submissions! If you have questions about the homework, email me BEFORE the deadline. Please pay attention to due date. No late homework will be accepted.