hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Krishna <vivekris...@gmail.com>
Subject Yet another bulk import question
Date Fri, 25 Mar 2011 00:22:21 GMT
Data Size - 20 GB.  It took about an hour with default hbase setting and
after varying several parameters, we were able to get this done in ~20
minutes.  This is slow and we are trying to improve.

We wrote a java client which would essentially `put` to hbase tables in
batches.  Our fine-tuning parameters include,
1.  Disabling compaction
2.  Varying batch sizes of put ( tried with 1000, 5000, 10000, 20000, 40000
3.  Setting AutoFlush to on/off.
4.  Varying write buffer(in client)  with 2mb, 128mb,256mb
5.  Changing regionserver.handler.count to 100
6.  Varying regionserver size from 128 to 256/512/1024.
7.  Increasing number of regions.
8.  Creating regions with keys pre-specified (so that clients hit the
regions directly)
9.  Varying number of clients (from 30 clients to 100 clients)

The above was tested on a 38 node cluster with 2 regions each.

We did not try disabling WAL fearing loss of data.

Are there any other parameters that we missed during the process?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message