hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase performance tuning
Date Tue, 25 Mar 2008 13:54:17 GMT
Your insert is single-threaded?  At a minimum your program should be 
multithreaded.  Randomize the keys on your data so that the inserts are 
spread across your 9 regionservers.  Better if you spend a bit of time 
and write a mapreduce job to do the insert (If you want a sample, write 
the list again and I'll put something together).
St.Ack

ANKUR GOEL wrote:
> Hi Folks,
>             I have a table with the following column families in the 
> schema
>        {"referer_id:", "100"},  (Integer here is max length)
>        {"url:","1500"},
>        {"site:","500"},
>        {"status:","100"}
>
> The common attributes for all the above column families are
> [max versions: 1,  compression: NONE, in memory: false,
> block cache enabled: true, max length: 100, bloom filter: none]
>
> [HBase Configuration]:
>   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
>   - HMaster runs on a different machine than NameNode.
>   - There are 9 regionserves configured
>   - Total DFS available  = 150 GB.
>   - LAN speed in 100 Mbps
>
> I am trying to insert approx 4.8 million rows and the speed that
> I get is around 1500 row inserts per sec (100,000 row inserts per min.).
>
> It takes around 50 min to insert all the seeds. The Java program
> that does the inserts uses buffered I/O to read the the data from a local
> file and runs on the same machine as the HMaster.To give you an idea
> of Java code that does the insert here is a snapshot of the loop.
>
> while ((url = seedReader.readLine()) != null) {
>      try {
>        BatchUpdate update = new BatchUpdate(new 
> Text(md5(normalizedUrl)));
>        update.put(new Text("url:"), getBytes(url));
>        update.put(new Text("site:"), getBytes(new URL(url).getHost()));
>        update.put(new Text("status:"), getBytes(status));
>        seedlist.commit(update); // seedlist is the HTable
>       }
> ....
> ....
>
> Is there a way to tune HBase to achieve better I/O speeds ?
> Ideally I would like to reduce the total insert time to less than 15 min
> i.e achieve an insert speed of around 4500 rows/sec or more.
>
> Thanks
> -Ankur
>
>


Mime
View raw message