hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peeyush Bishnoi <peeyu...@yahoo-inc.com>
Subject RE: HBase performance tuning
Date Fri, 28 Mar 2008 12:34:37 GMT
As lot of statistics going on for HBase in this mail. I think I should
publish statistics here for Hbase Performance.

I run the Hbase based Map-Reduce code with 10,50 and 100 mappers and
best i achieved with 100 mappers on 100 nodes.
Total records inserted is 16.3 million . Throughput I achieved is
16497.975 records inserts/sec . Total time taken is 988 seconds(16.4
min).

Thanks

---
Peeyush



On Fri, 2008-03-28 at 17:39 +0530, Goel, Ankur wrote:

> Ok, So I picked up and modified the code for my use and tried it with
> different
> configurations varying the no. of reduces in each run (10, 20, 40, 80,
> 200) and 
> the best throughput I could get (with 200 reducers) was 4306
> inserts/sec. 
> The total runtime being 17 min. for 4.38 million seeds.
> 
> Using my threaded client running 200 threads I managed same number of
> inserts 
> in 12 min.
> 
> Looks like Map-Red insert is slower than our regular threaded insert.
> Can gain performance via any other tweak ? 
> If not then is there any reasonable scope of performance improvement of
> Hbase
> via code optimization ? 
> 
> (I wouldn't mind taking a deep dive into the code to optimize core HBase
> memory
>   structures and contribute to HBase)
> 
> Thanks
> -Ankur
> 
>     
> 
> -----Original Message-----
> From: stack [mailto:stack@duboce.net] 
> Sent: Thursday, March 27, 2008 12:05 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase performance tuning
> 
> I just posted EXAMPLE code to the hbase MR wiki page: 
> http://wiki.apache.org/hadoop/Hbase/MapReduce
> St.Ack
> 
> 
> 
> 
> Naama Kraus wrote:
> > Hi,
> >
> > A sample MapReduce for an insert would be interesting to me also !
> >
> > Naama
> >
> > On Tue, Mar 25, 2008 at 3:54 PM, stack <stack@duboce.net> wrote:
> >
> >   
> >> Your insert is single-threaded?  At a minimum your program should be 
> >> multithreaded.  Randomize the keys on your data so that the inserts 
> >> are spread across your 9 regionservers.  Better if you spend a bit of
> 
> >> time and write a mapreduce job to do the insert (If you want a 
> >> sample, write the list again and I'll put something together).
> >> St.Ack
> >>
> >> ANKUR GOEL wrote:
> >>     
> >>> Hi Folks,
> >>>             I have a table with the following column families in the
> 
> >>> schema
> >>>        {"referer_id:", "100"},  (Integer here is max length)
> >>>        {"url:","1500"},
> >>>        {"site:","500"},
> >>>        {"status:","100"}
> >>>
> >>> The common attributes for all the above column families are [max 
> >>> versions: 1,  compression: NONE, in memory: false, block cache 
> >>> enabled: true, max length: 100, bloom filter: none]
> >>>
> >>> [HBase Configuration]:
> >>>   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU
> cores.
> >>>   - HMaster runs on a different machine than NameNode.
> >>>   - There are 9 regionserves configured
> >>>   - Total DFS available  = 150 GB.
> >>>   - LAN speed in 100 Mbps
> >>>
> >>> I am trying to insert approx 4.8 million rows and the speed that I 
> >>> get is around 1500 row inserts per sec (100,000 row inserts per
> min.).
> >>>
> >>> It takes around 50 min to insert all the seeds. The Java program 
> >>> that does the inserts uses buffered I/O to read the the data from a
> >>>       
> >> local
> >>     
> >>> file and runs on the same machine as the HMaster.To give you an idea
> 
> >>> of Java code that does the insert here is a snapshot of the loop.
> >>>
> >>> while ((url = seedReader.readLine()) != null) {
> >>>      try {
> >>>        BatchUpdate update = new BatchUpdate(new 
> >>> Text(md5(normalizedUrl)));
> >>>        update.put(new Text("url:"), getBytes(url));
> >>>        update.put(new Text("site:"), getBytes(new
> URL(url).getHost()));
> >>>        update.put(new Text("status:"), getBytes(status));
> >>>        seedlist.commit(update); // seedlist is the HTable
> >>>       }
> >>> ....
> >>> ....
> >>>
> >>> Is there a way to tune HBase to achieve better I/O speeds ?
> >>> Ideally I would like to reduce the total insert time to less than 15
> 
> >>> min i.e achieve an insert speed of around 4500 rows/sec or more.
> >>>
> >>> Thanks
> >>> -Ankur
> >>>
> >>>
> >>>       
> >>     
> >
> >
> >   
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message