Finding Data

.docx

School

Indiana University, Purdue University, Indianapolis *

*We aren’t endorsed by this school

Course

211

Subject

Industrial Engineering

Date

Apr 3, 2024

Type

docx

Pages

2

Uploaded by Samichy on coursehero.com

Abdus Sami Chowdhury 04/21/23 LIS-S201 Finding Data The first dataset, called “Crash Reporting – Drivers Data”, can be found on data.gov and is available in various file formats e.g., CSV, JSON etc. Its purpose is to display information related to traffic collisions of drivers on county and local roads. Focusing within Montgomery County collected via the Automated Crash Reporting System (ACRS) in Maryland. It answers a plethora of questions that help investigate the factors contributing to the crashes e.g., reporting number, crash date and time, road name, weather and surface conditions, driver’s and vehicle ID and location etc. After reviewing with Excel, the data looks well-structured and organized with columns to reflect the different factors mentioned above and rows to represent the individual crash records in a tabular setting making it easier to interpret the data. The data is fairly consistent, but some columns are missing data that may not have been recorded and can affect the outcome and accuracy of an analysis. Furthermore, no data cleaning is required as there are little to no variations in data formats e.g., spellings, abbreviations or cases etc. Yes, I consider the dataset to be a good candidate. As most importantly no data cleaning is required and is clear to understand. And easier to create pivot tables, visualizations and summary statistics due to fewer inconsistencies. The second dataset called “Electric Vehicle – Population Data”, can be found on data.gov and available in CSV and XML etc. file formats. Its purpose is to show that Battery Electric Vehicles and Plug-in Hybrid Electric Vehicles are being adopted and used in the US, data collected by Washington State Department of Licensing. It addresses a plethora of questions e.g., Vehicle Identification Number (VIN), location, model and make, type and range, price and electric utility. The data looks clean and easy to understand as there is little to no variation in data i.e., spellings, abbreviations or cases etc. And is arranged in a tabular format in which the columns are the different factors and the rows being the information presented. No cleaning required as data is consistent. However, some columns may not have been recorded or replaced by zeros as they were not researched, which causes inaccurate analysis. It will not be a good candidate because there could be additional data that could have been provided i.e. about users and other factors besides there is several data missing that would be difficult to visualize or make pivot tables. Links 1. https://catalog.data.gov/dataset/crash-reporting-drivers-data 2. https://catalog.data.gov/dataset/electric-vehicle-population-data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help