Aaron Beppu
Subject [jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool
Date Wed, 31 Dec 2014

Aaron Beppu commented on HBASE-12728:

I like the API exposed to the user of having a BufferedConnection + BufferedTable that can
be swapped in so easily.

The tricky part that I as a user would be cautious of is that since HTableMultiplexer maintains
a different buffer for each region server, the timing at which flushes happen, and the age
of writes by the time they get flushed will be slightly more complicated to reason about than
with the buffer-per-table model.

Here are example differences that I would bear in mind while auditing our use of buffered
writes to predict the impacts of migrating to this idiom:

1. With the buffer-per-table model, the time-in-buffer for a given write was roughly just
HTablePoolSize * writeBufferSize / (writes per second). With the buffer-per-regionserver model
used by HTM, if writes aren't uniformly distributed over the region servers for whatever reason,
writes going to "cold" regionservers will live in buffer for longer than writes going to "hot"
region servers.

2. With the buffer-per-table model, time-in-buffer for writes to table A was independent of
stuff happening on table B (so long as we don't totally overwhelm the cluster or something).
With the HTM model, a decrease in write volume to table B can increase my time-in-buffer for
table A. We may choose to have separate BufferedConnections with separate HTM instances specifically
to avoid this.

3. Even if I just want to migrate my system onto this idiom without changing the number or
size of flushes then I'd want to pick HTableMultiplexer.perRegionServerBufferQueueSize such
perRegionServerBufferQueueSize * (# of region servers) ~= HTable.writeBufferSize * (average
size of HTablePool)
The only thing that's weird about that is that (# of region servers) changes over time. I.e.
if today I pick reasonable buffer sizes for HTM, then in 6 months, if the incoming write rate
is unchanged but the cluster is larger due to data growth, my time-in-buffer will have increased.

>From just the API described, I think the proposal above looks really clean. From the perspective
of someone operating a system where using HTablePool + buffered writes was a calculated risk,
the HTM-driven buffering sounds workable, but it opens the door for a range of new variables
to influence our system's core write pathways, and for that reason I'd be cautious adopting

> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
> In previous versions of HBase, when use of HTablePool was encouraged, HTable instances
were long-lived in that pool, and for that reason, if autoFlush was set to false, the table
instance could accumulate a full buffer of writes before a flush was triggered. Writes from
the client to the cluster could then be substantially larger and less frequent than without
> However, when HTablePool was deprecated, the primary justification seems to have been
that creating HTable instances is cheap, so long as the connection and executor service being
passed to it are pre-provided. A use pattern was encouraged where users should create a new
HTable instance for every operation, using an existing connection and executor service, and
then close the table. In this pattern, buffered writes are substantially less useful; writes
are as small and as frequent as they would have been with autoflush=true, except the synchronous
write is moved from the operation itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one of two things
must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of it's HTableInstance.
If the writeBuffer were managed elsewhere and had a long lifespan, this could cease to be
an issue. However, if the same writeBuffer is appended to by multiple tables, then some additional
concurrency control will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable instances.
However, since HTable is not thread-safe, we'd need multiple instances, and a mechanism for
leasing them out safely -- which sure sounds a lot like the old HTablePool to me.
> See discussion on mailing list here : http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E

