hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xine Jar <xineja...@googlemail.com>
Subject skipping the map input value and key?!!
Date Fri, 04 Sep 2009 12:46:35 GMT
I have a mapreduce application reading from an existing hbase table. The map
function searches for some values in the table and the reduce function
averages them.

*My question is simple :

*I have initially written the program passing to the Map function the* "
Input key type: ImmutableBytesWritable and Input Value:RowResult"**. *I have
set of course the *setInputFormat(TableInputFormat.class)* and set as well

I have added a debug user counter in order to check how often my table has
been read and discovered (with your help as well) that the table is read N
times where N is the number of rows in the table. Which was of course not
acceptable. This was due to the fact that I am passing the RowResult as an
input to the Map function.

I decided not to pass the RowResult as an input format to the map but I have
passed a Text which in fact I am not using at all in the map function, I
have used it only in oder to pass anything so that hadoop does not give me
an error :) . Then, similarly to the first method, in the map function I
have created a scanner on the hbase table and started reading the rows.

With this solution, Once I haven't passed the RowResult as a parameter ot
the mapper, the job was much faster and the table was read only once!!!


**-*Are there any hidden performance issues or complications behind my
method 2?

-It is true that I reached a solution with what I have done but I am
wondering if I can do it in a cleaner way. So I was wondering if I could
somehow skip the fact passing an input key and input value to the map? If
yes how?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message