hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Alonso CIPF <ralo...@cipf.es>
Subject questions about splits in regions
Date Tue, 27 Mar 2012 09:06:54 GMT
Hello All,

I have some doubts about hbase that hopefully you can help me.
My architecture is the next:
I have 4 servers(server_{1,2,3,4}) with 6GB Ram and 2 cores. I installed
hadoop in all of them, this is the configuration:
- server_1 is namenode, datanode and secondarynamenode, jobtracker
- server_2, server_3, server_4: datanodes, tasktracker
- server_2: zookeeper
The storage is aroung 500GB

I have a file with around 22000000 of records (it will grow) and I want to
put it in a table
1. I create a table from code, should I split by myself the regions?in this
case, should I follow any strategy?  or let Hbase splits the region by
itself? what is it better?
2. After I put this table in Hbase I have a map reduce code that reads all
the rows and takes some rows of interest and it writes a file in the disk
(FileOutputFormat.setOutputPath(job, new Path( tmpPath )); it doesn't do
the reduce part). As I see in an htop to my servers, Hbase is reading the
table sequentially even if the table is splitted in the servers, so should
I configure my map reduce job to take the regions and do it in parallel
3. Also, I was wondering if I could use traditional Threads to throw more
than one map reduce job, or is it weird?

Thanks a lot, I am stack...

Roberto Alonso
Bioinformatics and Genomics Department
Centro de Investigacion Principe Felipe (CIPF)
C/E.P. Avda. Autopista del Saler, 16-3 (junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralonso@cipf.es

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message