hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Transpose
Date Wed, 06 Mar 2013 10:11:17 GMT
Remember KISS.

Don't try to read it in as anything but just a text line. 
Its really a 3x3 matrix in what looks to be grouped by columns.

Your output will drop the initial key, and you then parse the lines and then output it. 
Without further explanation, it looks like each tuple is unique.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 11:27 AM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> Hi,
> Essentially what you want to do is group your data points by their position in the column,
and have each reduce call construct the data for each row into a row.  To have each record
that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter",
";").  Your mapper will receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string. 
> Emiting a map output for each data point in each column, you can then use secondary sort
to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is the row index;
the LongWritable passed in as the map input key).  Each reduce call would get all the points
in a single row. You would sort/group by row index, and within a reduce's values, sort by
byte index so that entries from earlier columns come before later ones.
> Does that make sense?
> Sandy
> On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pig.mixed@gmail.com> wrote:
>> Hi
>> I have data in a file as follows . There are 3 columns separated by semicolon(;).
Each column would have multiple values separated by comma (,). 
>> 11,22,33;144,244,344;yny;
>> I need output data in below format. It is like transposing  values of each column.
>> 11 144 y	
>> 22 244 n
>> 33 344 y
>> Can we write map reduce program to achieve this. Could you help on the code on how
to write.
>> Thanks

View raw message