hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Best number of mappers and reducers when processing data to and from HBase?
Date Mon, 20 Oct 2014 14:22:51 GMT
For number of mappers, take a look at the following in TableInputFormatBase:

  public List<InputSplit> getSplits(JobContext context) throws IOException {

Is reducer required in your model ?

Can you write to second hbase table from the mappers ?


On Mon, Oct 20, 2014 at 7:08 AM, peterm_second <regestrer@gmail.com> wrote:

> Hi Guys,
> I have a somewhat abstract question to ask. I am reading data from Hbase
> and I was wondering how am I to know what's the best mapper and reducer
> count, I mean what are the criteria that need to be taken into
> consideration when determining the mapper and reducer counts. My MR job is
> reeding data from a Hbase table, said data is processed in the mapper and
> the reducer takes the data and outputs some stuff to another Hbase table. I
> want to be able to dinamicly deduce what's the correct number of mappers to
> initially process the data (actually map it to a specific criterion ) and
> the reducers to later do some other magic on it and output a new dataset
> which then saved to a new Hbase Table. I've read that when reading data
> from files I should have something like 10 mappers per DFS block, but I
> have no clue how to translate that in my case where the input is a HBase
> table. Any ideas would be appreciated, even if it's a book or an article I
> should read.
> Regards,
> Peter

View raw message