hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Two map inputs (file & HBase). "Join" file data and Hbase data into a map reduce.
Date Thu, 27 Sep 2012 17:55:04 GMT
Hi Pablo

>I could read the file and do a get followed by a put, but this would not
be a MR job****

>and would be very slow if there are a lot of entries in the file.

If you have a large file, by using mapreduce you can parallelize the hbase
gets and puts. Configure the split size accordingly so that there are
sufficient number of mappers to ensure enough parallelism and good
performance.

On Thu, Sep 27, 2012 at 11:14 PM, Pablo Musa <pablo@psafe.com> wrote:

> Hey guys,****
>
> I am not sure if this is the correct list (could also be HBase), but I
> think my doubt****
>
> is more related to the MR than to the HBase itself.****
>
> ** **
>
> I am trying to update some columns of a family in my Hbase db using a MR
> job.****
>
> In each column I have a byte array with different information concatenated.
> ****
>
> ** **
>
> So far, so easy. A MR job with the table as input and the scan.setfamily.*
> ***
>
> My problem is that the rows that I want to update are inside a file. In***
> *
>
> other words, I have a big file containing all the rows that should be
> updated.****
>
> ** **
>
> So, I have to read the row:column content so I can update it and then
> write it again.****
>
> But I also have to read the file in order to know which files I should
> update.****
>
> ** **
>
> I could read the file and do a get followed by a put, but this would not
> be a MR job****
>
> and would be very slow if there are a lot of entries in the file.****
>
> ** **
>
> Another possibility I thought but don’t know how to implement, is to use
> the table****
>
> as input and have a Map with the rows that should be updated. The problem
> is that****
>
> I don’t know how to distribute this Map or how to distribute the file so
> every Map****
>
> can read it.****
>
> ** **
>
> Any thoughts?****
>
> ** **
>
> Thanks,****
>
> Pablo****
>

Mime
View raw message