hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject Re: HBase put() improvement?
Date Wed, 06 Jun 2007 17:11:35 GMT
A batch update is a good idea, but not yet on the road map (we have to
get it working first).

I'll open a Jira for this so it doesn't get lost.

Thanks.

-Jim

On Wed, 2007-06-06 at 10:05 -0700, James Kennedy wrote:
> Hi,
> 
> I'm noticing that since the HClient/HRegionServer interface only allows 
> for a per-column put(), there is a lot of RPC and some lease management 
> overhead when writing large amounts of data. For example:
> 
>         for (int i = 0; i < 10000; i++) {
>             Text rowKey = new Text(i+"");
>             long lock = client.startUpdate(rowKey);
>             client.put(lock, COL1, rowKey.getBytes());
>             client.put(lock, COL2, someValue.getBytes());
>             client.commit(lock);
>         }
> 
> This code takes my machine (using a single HMaster/HRegionServer on 
> local filesystem) approximately 13 seconds to execute. When i measure 
> the execution time within HRegionServer.put() I get total time spent in 
> put() < 2 seconds. So it looks like there's definately overhead in the 
> RPC communication and serialization/deserialization between client and 
> server. 
> 
> To write 10000 rows, 10000 x (startUpdate=1 +  #cols=2 + commit=1) = 
> 40000 RPC operations.
> 
> What I'm thinking, and please tell me if i'm wrong or if this is already 
> in the works, is that if I create a row-level put() method that submits 
> a map of column values at once, I would reduce the 2 + (#cols) RPC 
> operations to one single atomic row-write RPC as well as eliminate the 
> small but noticeable overhead in lease creation, renewal, and cancellation.
> 
> It's not clear exactly what the performance improvement would be. The 
> same amount of serialization/deserilalization must occur, but YourKit 
> profiling tells me that the serialization overhead is negligible.
> 
> Any thoughts?
> 
> 
> 
> 
> 
-- 
Jim Kellerman, Senior Engineer; Powerset
jim@powerset.com

Mime
View raw message