hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase performance tuning
Date Wed, 26 Mar 2008 15:10:08 GMT
Goel, Ankur wrote:
> I use the 'trunk' to get and build the code locally.
> Does the latest code on it have all the fixes ?
>   
I would suggest your using the branch or the release candidate instead 
because there are no guarantees against breakage in TRUNK; TRUNK is 
undergoing lots of churn at the moment.
St.Ack

> Thanks
> -Ankur
>
> -----Original Message-----
> From: stack [mailto:stack@duboce.net] 
> Sent: Tuesday, March 25, 2008 7:25 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase performance tuning
>
> Are you using hbase 0.1 branch?  Try the release candidate.  Has a fix
> for 'table does not exist' issue among other fixes.
> St.Ack
>
> Goel, Ankur wrote:
>   
>> Hi Again,
>>           A couple of issues that I faced are as follows
>>
>> 1. If I terminate the local client (Java program used for insert. 
>> Please see the post before this.)
>>    HBase goes into an inconsistent state. Though the tables are still 
>> shown to be available, an attempt
>>    to drop the table gives the message "Table does not exist". If I 
>> try to truncate the table, I get an
>>    IOException on the client from 'TableOperation' class. 
>>    It looks like the abrupt closure of socket connection from the 
>> client side corrupted the META information
>>    and also the data files. Is this a known issue ? If not I can 
>> reproduce it and while a JIRA issue with
>>    stack trace and description.
>>
>> 2. Trying to connect to region servers from a remote location and 
>> inserting data from a file local to remote
>>    client gave an insert speed of 3 rows/sec = 180 rows/min !!!. This 
>> is terribly slow when the available
>>    bandwidth is 2 Mbps. Any ideas on what could be the bottle neck 
>> here ?
>>
>> Thanks
>> -Ankur
>>
>>
>> -----Original Message-----
>> From: ANKUR GOEL [mailto:ankur.goel@corp.aol.com]
>> Sent: Tuesday, March 25, 2008 7:05 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: HBase performance tuning
>>
>> Hi Folks,
>>              I have a table with the following column families in the 
>> schema
>>         {"referer_id:", "100"},  (Integer here is max length)
>>         {"url:","1500"},
>>         {"site:","500"},
>>         {"status:","100"}
>>
>> The common attributes for all the above column families are [max
>> versions: 1,  compression: NONE, in memory: false, block cache
>>     
> enabled:
>   
>> true, max length: 100, bloom filter: none]
>>
>> [HBase Configuration]:
>>    - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
>>    - HMaster runs on a different machine than NameNode.
>>    - There are 9 regionserves configured
>>    - Total DFS available  = 150 GB.
>>    - LAN speed in 100 Mbps
>>
>> I am trying to insert approx 4.8 million rows and the speed that I get
>>     
>
>   
>> is around 1500 row inserts per sec (100,000 row inserts per min.).
>>
>> It takes around 50 min to insert all the seeds. The Java program that 
>> does the inserts uses buffered I/O to read the the data from a local 
>> file and runs on the same machine as the HMaster.To give you an idea 
>> of Java code that does the insert here is a snapshot of the loop.
>>
>>  while ((url = seedReader.readLine()) != null) {
>>       try {
>>         BatchUpdate update = new BatchUpdate(new 
>> Text(md5(normalizedUrl)));
>>         update.put(new Text("url:"), getBytes(url));
>>         update.put(new Text("site:"), getBytes(new
>>     
> URL(url).getHost()));
>   
>>         update.put(new Text("status:"), getBytes(status));
>>         seedlist.commit(update); // seedlist is the HTable
>>        }
>> ....
>> ....
>>
>> Is there a way to tune HBase to achieve better I/O speeds ?
>> Ideally I would like to reduce the total insert time to less than 15 
>> min i.e achieve an insert speed of around 4500 rows/sec or more.
>>
>> Thanks
>> -Ankur
>>
>>
>>   
>>     
>
>   


Mime
View raw message