hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Musa <pa...@psafe.com>
Subject Two map inputs (file & HBase). "Join" file data and Hbase data into a map reduce.
Date Thu, 27 Sep 2012 17:44:48 GMT
Hey guys,
I am not sure if this is the correct list (could also be HBase), but I think my doubt
is more related to the MR than to the HBase itself.

I am trying to update some columns of a family in my Hbase db using a MR job.
In each column I have a byte array with different information concatenated.

So far, so easy. A MR job with the table as input and the scan.setfamily.
My problem is that the rows that I want to update are inside a file. In
other words, I have a big file containing all the rows that should be updated.

So, I have to read the row:column content so I can update it and then write it again.
But I also have to read the file in order to know which files I should update.

I could read the file and do a get followed by a put, but this would not be a MR job
and would be very slow if there are a lot of entries in the file.

Another possibility I thought but don't know how to implement, is to use the table
as input and have a Map with the rows that should be updated. The problem is that
I don't know how to distribute this Map or how to distribute the file so every Map
can read it.

Any thoughts?


View raw message