hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Cores gonzalez <ivan.co...@inria.fr>
Subject Processing rows in parallel with MapReduce jobs.
Date Mon, 11 Apr 2016 12:48:51 GMT
Hi all, 

I have a small question regarding the MapReduce jobs behaviour with HBase. 

I have a HBase test table with only 8 rows. I splitted the table with the hbase shell 
split command into 2 splits. So now there are 4 rows in every split. 

I create a MapReduce job that only prints the row key in the log files. 
When I run the MapReduce job, every row is processed by 1 mapper. But the mappers 
in the same split are executed sequentially (inside the same container). That means, 
the first four rows are processed sequentially by 4 mappers. The system has cores 
that are free, so is it possible to process rows in parallel if they are located 
in the same split? 

The only way I found to have 8 mappers executed in parallel is split the table 
in 8 splits (1 split per row). But obviously this is not the best solution for 
big tables ... 


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message