hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioakim Perros <imper...@gmail.com>
Subject Re: Bulk load - #Reducers different from #Regions
Date Tue, 07 Aug 2012 17:02:27 GMT
Excuse me for not well-defining.

I am bulk updating my hbase table through code, using 
configureIncrementalLoad function of HFileOutputFormat. At the 
respective documentation, I read that this function " Sets the number of 
reduce tasks to match the current number of regions" ,but I was 
wondering if I could explicitly avoid it, perhaps by another way of bulk 
importing data.

PS: I try to insist on bulk importing, because I have understood (I hope 
that this is correct), that it is much more efficient than going with 
the traditional Hbase API. And as I require my job to be of iterative 
nature, this way hopefully would end up giving a good boost-up, as 
opposed to the Hbase API.

Thank you for responding.

On 08/07/2012 07:53 PM, Subir S wrote:
> Bulk load using
> ImportTsv with pre-splitted regions for target table?
> Do u mean to set number of reducers that ImportTsv must use?
> On 8/7/12, Ioakim Perros <imperros@gmail.com> wrote:
>> HI,
>> I am bulk importing (updating) data iteratively and I would like to be
>> able to set the number of reducers at a M/R task, to be different from
>> the number of regions of the table to which I am updating data.
>> I tried it through job.setNumReduceTasks(#reducers), but the job ignored
>> it.
>> Is there a way to avoid an intermediary job and to set the number of
>> reducers explicitly ?
>> I would be grateful if anyone could shed a light to this.
>> Thanks and regards,
>> Ioakim

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message