Map Reduce

Map Reduce works by breaking the processing into two phases :

Map phase & Reduce phase

Each phase has a key pair value pairs as input and output, the types of which is chosen by the programmer. The programmer also specifies the two functions :
Map Function & Reduce Function

The input to the map phase is the new raw data. We choose the text input format that gives us each line in the data set as a text value.

The map function phase is just a preparation phase, setting up a data in such a way that the reducer function works on it after completion of mapper works. The map is a good place for dropping bad records.


Example : Finding the maximum temperature for each year

Here we filter out temperature  that are missing, suspect or erroneous.

Consider the sample lines of data :

0001232012500.........9999+00001+999999999999999999999...
0001232012500.........9999+00221+999999999999999999999...
0001232012600.........9999+00111+999999999999999999999...
0001232013600.........9999+01111+999999999999999999999...
0001232013800.........9999+07811+999999999999999999999...


These lines are presented to the map function as the key-value pair :



(0, 0001232012500.........9999+00001+999999999999999999999... )
(106, 0001232012500.........9999+00221+999999999999999999999... )
(201, 0001232012600.........9999-00111+999999999999999999999... )
(318, 0001232013600.........9999+01111+999999999999999999999... )
(424, 0001232013800.........9999+00781+999999999999999999999... )

The keys are the line offsets within the file, which we ignore in our map function. The map function extracts the year and air temperature and emits them as anoutput.

(2012, 0)
(2012, 22)
(2012, -11)
(2013, 111)
(2013, 78)


Output of map function is processed by the mapReduce framework before being sent to the reduce function. In other words we can say that , Sorting of data.

(2012, [ 0, 22, -11 ] )
(2013, [ 111, 78 ] )


All the reduce function has to do now is iterate the list and pack up the maximum reading.

(2012, 22)
(2013, 111)

 maximum global temperature recorded for 2012 & 2013 year.

 Now, the question is How The MapReduce Works ?

To implement the MapReduce, we need the things a Map function, a Reduce Function and some code to run the jobs (that is main class).

The Map function is represented by the Map Class, which declares the abstract map() method.

 Reduce function is represented by the Reduce class.





Popular Posts