hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Transpose
Date Tue, 05 Mar 2013 17:27:35 GMT

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?


On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <pig.mixed@gmail.com> wrote:

> Hi
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
> 11,22,33;144,244,344;yny;
> I need output data in below format. It is like transposing  values of each
> column.
> 11 144 y
> 22 244 n
> 33 344 y
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
> Thanks

View raw message