# Homework 3 – MapReduce ## Problem Statement: We are greatly inspired by the [Consumer Complaints](https://github.com/InsightDataScience/consumer_complaints) challenge from the popular InsightDataScience. In fact, we are going to tackle the same challenge but using MapReduce. Please read through the challenge (the most important sections for us are “Input dataset” and “Expected output"). ## Requirements: 1. You must perform your computations using only Python and the MRJob package that we use in class. No external packages, e.g. pandas, are allowed. 2. Your code must be able to run as a stand-alone MRJob application. ## INPUT: Your code will be evaluated against a sample of the original data set (in CSV format) downloaded from: [https://www.consumerfinance.gov/data-research/consumer-complaints/#download-the-data](https://www.consumerfinance.gov/data-research/consumer-complaints/#download-the-data) The original data set is roughly 1GB but the sample file is only 4MB, and is available on our class resources under Data Sets > complaints_sample.csv. You can use this file for testing your code within a notebook if you prefer. **NOTE:** This CSV file contains multiple-line records. Please pay attention to this when reading the data. ## OUTPUT: You are required to write to the standard output in CSV format. Basically, you have to organize each of your records as a CSV row when you output from Spark. The output does not have to contain the header line. ## SUBMISSION: The final hand-in should be a single file, named `BDM_HW3_LastName.py` that takes exactly 1 argument for the input path. Output will be handled through redirection. ## SAMPLE RUN: ``` python BDM_HW3_LastName.py complaints_sample.csv > output.csv ```
Please help me in python using MR job only.
Python Programming :
Python is a deciphered, significant level and universally useful programming language. Made by Person GVD and first delivered in 1991, Python's plan theory stresses code meaningfulness with its striking utilization of huge whitespace. Its language develops and object-situated methodology mean to assist software engineers with composing clear, consistent code for little and enormous scope projects.
Python is powerfully composed and trash gathered. It underpins different programming standards, including organized (especially, procedural), object-situated, and utilitarian programming. Python is regularly depicted as a "batteries included" language because of its exhaustive standard library.
Python was made in the last part of the 1980s as a replacement to the ABC language. Python 2.0, delivered in 2000, presented highlights like rundown appreciations and a trash assortment framework with reference checking.
Python 3.0, delivered in 2008, was a significant amendment of the language that isn't totally in reverse viable, and much Python 2 code doesn't run unmodified on Python 3.
The Python 2 language was formally stopped in 2020 (first anticipated 2015), and "Python 2.7.18 is the last Python 2.7 delivery and consequently the last Python 2 release." No greater security patches or different enhancements will be delivered for it. With Python 2's finish of-life, just Python 3.6.x and later are upheld.
Trending now
This is a popular solution!
Step by step
Solved in 3 steps with 1 images