hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Processing rows in parallel with MapReduce jobs.
Date Mon, 11 Apr 2016 13:10:29 GMT
bq. if they are located in the same split?

Probably you meant same region.

Can you show the getSplits() for the InputFormat of your MapReduce job ?


On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez <ivan.cores@inria.fr>

> Hi all,
> I have a small question regarding the MapReduce jobs behaviour with HBase.
> I have a HBase test table with only 8 rows. I splitted the table with the
> hbase shell
> split command into 2 splits. So now there are 4 rows in every split.
> I create a MapReduce job that only prints the row key in the log files.
> When I run the MapReduce job, every row is processed by 1 mapper. But the
> mappers
> in the same split are executed sequentially (inside the same container).
> That means,
> the first four rows are processed sequentially by 4 mappers. The system
> has cores
> that are free, so is it possible to process rows in parallel if they are
> located
> in the same split?
> The only way I found to have 8 mappers executed in parallel is split the
> table
> in 8 splits (1 split per row). But obviously this is not the best solution
> for
> big tables ...
> Thanks,
> Ivan.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message