hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sakin cali <sakinnnc...@gmail.com>
Subject Enhancing hbase bulk import performance
Date Fri, 25 May 2012 14:08:04 GMT
Hi all,

I have a few question regarding bulk load,
some of them may be "novice", sorry for them...

I am trying to enhance my bulk loading performance into hbase.

 - I have one table with one column family and 10 columns.
 - 4 pc cluster ( each: i5 2400 cpu, 1tb harddisk, 4 gb ram)
 - Ubuntu 12.04 64 bit
 - CDH3 installiation
 - Hdfs: 1 namenode, 4 datanode
            1 jobtracker, 4 tasktracker
            replication = 1
 - Hbase:
            1 master, 4 slaves

My software architecture:

1- I have a server application listening ports for incomming rows

2- I am creating table with pre splits.  ( say split1, split2, split3,

3- I have a worker for each split. When a row arrives I decide which split
the arriving key will go..
Than I pass the incoming row to the responsible worker.
Each worker writes its own hfile periodically ( each 2-3 minutes).

Writing hfiles requires disk io, I need to increase hfile writing
Is it possible to write hfile in memory (like memory mapped file) and flush
to disk when finished writing?
I am looking for some hdfs tunning for incrementing disk io performans, do
you have any advice?

4-  I have another worker which takes written hfiles and loads them to
I have a question at that point. doBulkLoad method takes a directory as
do I have to clean this directory after each doBulkLoad invocation,
Because, if I don't clean this directory, I think it will try to load same
files again, am I wrong?

5- My application currently works on master machine,
I am planning to run this application on each pc in my cluster?
I mean, do bulkload can be done in parallel?

6- I am writing each row in hfile in increasing key order. I remember that
I read something regarding this key order.
Do I have to write to hfile regarding key order?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message