hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ANKUR GOEL <ankur.g...@corp.aol.com>
Subject HBase performance tuning
Date Tue, 25 Mar 2008 13:34:44 GMT
Hi Folks,
             I have a table with the following column families in the 
schema
        {"referer_id:", "100"},  (Integer here is max length)
        {"url:","1500"},
        {"site:","500"},
        {"status:","100"}

The common attributes for all the above column families are
[max versions: 1,  compression: NONE, in memory: false,
block cache enabled: true, max length: 100, bloom filter: none]

[HBase Configuration]:
   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
   - HMaster runs on a different machine than NameNode.
   - There are 9 regionserves configured
   - Total DFS available  = 150 GB.
   - LAN speed in 100 Mbps

I am trying to insert approx 4.8 million rows and the speed that
I get is around 1500 row inserts per sec (100,000 row inserts per min.).

It takes around 50 min to insert all the seeds. The Java program
that does the inserts uses buffered I/O to read the the data from a local
file and runs on the same machine as the HMaster.To give you an idea
of Java code that does the insert here is a snapshot of the loop.

 while ((url = seedReader.readLine()) != null) {
      try {
        BatchUpdate update = new BatchUpdate(new Text(md5(normalizedUrl)));
        update.put(new Text("url:"), getBytes(url));
        update.put(new Text("site:"), getBytes(new URL(url).getHost()));
        update.put(new Text("status:"), getBytes(status));
        seedlist.commit(update); // seedlist is the HTable
       }
....
....

Is there a way to tune HBase to achieve better I/O speeds ?
Ideally I would like to reduce the total insert time to less than 15 min
i.e achieve an insert speed of around 4500 rows/sec or more.

Thanks
-Ankur



Mime
View raw message