hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject Re: HTable.put(List<Put> puts) perform batch insert?
Date Mon, 31 Jan 2011 18:48:02 GMT
On Fri, Jan 14, 2011 at 10:51 PM, tsuna <tsunanet@gmail.com> wrote:

> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun
> <sean.bigdatafun@gmail.com> wrote:
> > But how can the client understand which k-v belongs to an individual RS?
> > Does it need to scan the .META. table? (if so, it's an expensive op). On
> the
> > RegionServer side, is it like processing multiple requests in a batch per
> > RPC?
>
> The client has to figure out which region each edit has to go to.  The
> client maintains a local cache of the META table, so when you
> frequently use the same working set of regions (which is common for
> most applications), the lookups are essentially free.
>
> The worst case is a client that does random-writes to all the regions
> in a huge table.  In this case, the client will end up discovering the
> location of all the regions of that table and keep this in its
> in-memory cache.  But regions move around, are split etc.  This does
> cause extra META lookups, but the latency for a META lookup is
> typically very small (even though the penalty incurred by the client
> compared to cache hits in its local META cache is huge, comparatively
> speaking).  Note that right now neither HTable nor asynchbase
> pro-actively evict unused entries from the local META cache to save
> memory.  I don't think anyone is running HBase at a scale where this
> optimization would be useful.
>
> If you have a write-heavy application, you're always going to get
> significantly higher throughput when you send your edits in batch to
> the server.  The downside to this is that when your client application
> dies, you lose all the edits in the un-committed batch.  Unlike
> HTable, asynchbase puts an upper bound on the amount of time an edit
> is allowed to remain in the client's buffer, which helps limit
> data-loss when a client crashes (OpenTSDB sets this to 1s by default,
> so when it dies, you know you lost at most 1s worth of datapoints).
>
*

setWriteBufferSize(1024*1014*10); // 10MB

*

*setAutoFlush(false*);

for(i=0; i<N; i++) {

  list.add(putitem[i]);

}

htable.put(list);


For the above pseudo code (using put(List) to commit update in HBase), can I
get a "batch transaction" success notification?
       * i.e., How can I know all the items have been successfully
committed? -- it seems that I can't get such information, all are
best-effort. Should I know some commits fail, I can do an application-level
retry.
       * *setAutoFlush(true*); does not seem to help us to get any more
reliable operation either.





>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>



-- 
--Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message