Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of sduskis@gmail.com designates
 209.85.217.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+RK=_CoxzVvf7UWsHybJ3AxQ7Rnv4f=AB3oP6UAe59-xMJXxQ@mail.gmail.com>
References: 
 <CAPdJLkEzmUQZ_kvD=8mrxi4V=hCmUp3g9MUZsddD+mon+AvNtg@mail.gmail.com>
	<CANZa=GvbC8rxJwWqN0h8gNFw45yKB_N_zCunk+g1z2-Pqfbakw@mail.gmail.com>
	<CA+RK=_Caf3_DAA4w1cqe4n5v5UQip9hKVsk6CDcEe38WzcTpCw@mail.gmail.com>
	<CAPdJLkHpDW5tcYVNnEKWLU1=mvjXN1ThQePFHxGwgEMyOKS3=g@mail.gmail.com>
	<CA+RK=_C2mHs=bur0=TOsD2SgwgO823OQifVkOUK6SR2An10LHg@mail.gmail.com>
	<CANZa=GtyCnFnL3cQXGq8hqV32TRScVmj81Smif1P0DWZN7Q29w@mail.gmail.com>
	<CA+RK=_CoxzVvf7UWsHybJ3AxQ7Rnv4f=AB3oP6UAe59-xMJXxQ@mail.gmail.com>
Date: Fri, 19 Dec 2014 15:02:54 -0500
Message-ID: 
 <CAN0bKpL-_7Wd8Lt-NxUMGG8ejQKAzCQRYRmmCo4fi5farDU05Q@mail.gmail.com>
Subject: Re: Efficient use of buffered writes in a post-HTablePool world?
From: Solomon Duskis <sduskis@gmail.com>
To: user@hbase.apache.org
Cc: lars hofhansl <larsh@apache.org>,
 =?UTF-8?Q?Enis_S=C3=B6ztutar?= <enis@apache.org>
Content-Type: multipart/alternative; boundary=089e01177631beefc5050a97313c

--089e01177631beefc5050a97313c
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Is this critical to sort out before 1.0, or is fixing this a post-1.0
enhancement?

-Solomon

On Fri, Dec 19, 2014 at 2:19 PM, Andrew Purtell <apurtell@apache.org> wrote=
:
>
> I don't like the dropped writes either. Just pointing out what we have no=
w.
> There is a gap no doubt.
>
> On Fri, Dec 19, 2014 at 11:16 AM, Nick Dimiduk <ndimiduk@apache.org>
> wrote:
> >
> > Thanks for the reminder about the Multiplexer, Andrew. It sort-of solve=
s
> > this problem, but think it's semantics of dropping writes are not
> desirable
> > in the general case. Further, my understanding was that the new
> connection
> > implementation is designed to handle this kind of use-case (hence cc'in=
g
> > Lars).
> >
> > On Fri, Dec 19, 2014 at 11:02 AM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> > >
> > > Aaron: Please post a copy of that feedback on the JIRA, pretty sure w=
e
> > will
> > > be having an improvement discussion there.
> > >
> > > On Fri, Dec 19, 2014 at 10:58 AM, Aaron Beppu <abeppu@siftscience.com=
>
> > > wrote:
> > > >
> > > > Nick : Thanks, I've created an issue [1].
> > > >
> > > > Pradeep : Yes, I have considered using that. However for the moment=
,
> > > we've
> > > > set it out of scope, since our migration from 0.94 -> 0.98 is
> already a
> > > bit
> > > > complicated, and we hoped to separate isolate these changes by not
> > moving
> > > > to the async client until after the current migration is complete.
> > > >
> > > > Andrew : HTableMultiplexer does seem like it would solve our buffer=
ed
> > > write
> > > > problem, albeit in an awkward way -- thanks! It kind of seems like
> > HTable
> > > > should then (if autoFlush =3D=3D false) send writes to the multiple=
xer,
> > > rather
> > > > than setting it in its own, short-lived writeBuffer. If nothing els=
e,
> > > it's
> > > > still super confusing that HTableInterface exposes setAutoFlush() a=
nd
> > > > setWriteBufferSize(), given that the writeBuffer won't meaningfully
> > > buffer
> > > > anything if all tables are short-lived.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/HBASE-12728
> > > >
> > > > On Fri, Dec 19, 2014 at 10:31 AM, Andrew Purtell <
> apurtell@apache.org>
> > > > wrote:
> > > > >
> > > > > I believe HTableMultiplexer[1] is meant to stand in for HTablePoo=
l
> > for
> > > > > buffered writing. FWIW, I've not used it.
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > > >
> > >
> >
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableMul=
tiplexer.html
> > > > >
> > > > >
> > > > > On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk <ndimiduk@apache.or=
g
> >
> > > > wrote:
> > > > > >
> > > > > > Hi Aaron,
> > > > > >
> > > > > > Your analysis is spot on and I do not believe this is by design=
.
> I
> > > see
> > > > > the
> > > > > > write buffer is owned by the table, while I would have expected
> > there
> > > > to
> > > > > be
> > > > > > a buffer per table all managed by the connection. I suggest you
> > > raise a
> > > > > > blocker ticket vs the 1.0.0 release that's just around the corn=
er
> > to
> > > > give
> > > > > > this the attention it needs. Let me know if you're not into
> JIRA, I
> > > can
> > > > > > raise one on your behalf.
> > > > > >
> > > > > > cc Lars, Enis.
> > > > > >
> > > > > > Nice work Aaron.
> > > > > > -n
> > > > > >
> > > > > > On Wed, Dec 17, 2014 at 6:44 PM, Aaron Beppu <
> > abeppu@siftscience.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > TLDR; in the absence of HTablePool, if HTable instances are
> > > > > short-lived,
> > > > > > > how should clients use buffered writes?
> > > > > > >
> > > > > > > I=E2=80=99m working on migrating a codebase from using 0.94.6=
 (CDH4.4)
> to
> > > > > 0.98.6
> > > > > > > (CDH5.2). One issue I=E2=80=99m confused by is how to effecti=
vely use
> > > > buffered
> > > > > > > writes now that HTablePool has been deprecated[1].
> > > > > > >
> > > > > > > In our 0.94 code, a pathway could get a table from the pool,
> > > > configure
> > > > > it
> > > > > > > with table.setAutoFlush(false); and write Puts to it. Those
> > writes
> > > > > would
> > > > > > > then go to the table instance=E2=80=99s writeBuffer, and thos=
e writes
> > would
> > > > > only
> > > > > > be
> > > > > > > flushed when the buffer was full, or when we were ready to
> close
> > > out
> > > > > the
> > > > > > > pool. We were intentionally choosing to have fewer, larger
> writes
> > > > from
> > > > > > the
> > > > > > > client to the cluster, and we knew we were giving up a degree
> of
> > > > safety
> > > > > > in
> > > > > > > exchange (i.e. if the client dies after it=E2=80=99s accepted=
 a write
> but
> > > > > before
> > > > > > > the flush for that write occurs, the data is lost). This seem=
s
> to
> > > be
> > > > a
> > > > > > > generally considered a reasonable choice (cf the HBase Book [=
2]
> > SS
> > > > > > 14.8.4)
> > > > > > >
> > > > > > > However in the 0.98 world, without HTablePool, the endorsed
> > pattern
> > > > [3]
> > > > > > > seems to be to create a new HTable via table =3D
> > > > > > > stashedHConnection.getTable(tableName, myExecutorService).
> > However,
> > > > > even
> > > > > > if
> > > > > > > we do table.setAutoFlush(false), because that table instance =
is
> > > > > > > short-lived, its buffer never gets full. We=E2=80=99ll create=
 a table
> > > > instance,
> > > > > > > write a put to it, try to close the table, and the close call
> > will
> > > > > > trigger
> > > > > > > a (synchronous) flush. Thus, not having HTablePool seems like
> it
> > > > would
> > > > > > > cause us to have many more small writes from the client to th=
e
> > > > cluster,
> > > > > > and
> > > > > > > basically wipe out the advantage of turning off autoflush.
> > > > > > >
> > > > > > > More concretely :
> > > > > > >
> > > > > > > // Given these two helpers ...
> > > > > > >
> > > > > > > private HTableInterface getAutoFlushTable(String tableName)
> > throws
> > > > > > > IOException {
> > > > > > >   // (autoflush is true by default)
> > > > > > >   return storedConnection.getTable(tableName, executorService=
);
> > > > > > > }
> > > > > > >
> > > > > > > private HTableInterface getBufferedTable(String tableName)
> throws
> > > > > > > IOException {
> > > > > > >   HTableInterface table =3D getAutoFlushTable(tableName);
> > > > > > >   table.setAutoFlush(false);
> > > > > > >   return table;
> > > > > > > }
> > > > > > >
> > > > > > > // it's my contention that these two methods would behave
> almost
> > > > > > > identically,
> > > > > > > // except the first will hit a synchronous flush during the p=
ut
> > > call,
> > > > > > > and the second will
> > > > > > > // flush during the (hidden) close call on table.
> > > > > > >
> > > > > > > private void writeAutoFlushed(Put somePut) throws IOException=
 {
> > > > > > >   try (HTableInterface table =3D getAutoFlushTable(tableName)=
) {
> > > > > > >     table.put(somePut); // will do synchronous flush
> > > > > > >   }
> > > > > > > }
> > > > > > >
> > > > > > > private void writeBuffered(Put somePut) throws IOException {
> > > > > > >   try (HTableInterface table =3D getBufferedTable(tableName))=
 {
> > > > > > >     table.put(somePut);
> > > > > > >   } // auto-close will trigger synchronous flush
> > > > > > > }
> > > > > > >
> > > > > > > It seems like the only way to avoid this is to have long-live=
d
> > > HTable
> > > > > > > instances, which get reused for multiple writes. However, sin=
ce
> > the
> > > > > > actual
> > > > > > > writes are driven from highly concurrent code, and since HTab=
le
> > is
> > > > not
> > > > > > > threadsafe, this would involve having a number of HTable
> > instances,
> > > > > and a
> > > > > > > control mechanism for leasing them out to individual threads
> > > safely.
> > > > > > Except
> > > > > > > at this point it seems like we will have recreated HTablePool=
,
> > > which
> > > > > > > suggests that we=E2=80=99re doing something deeply wrong.
> > > > > > >
> > > > > > > What am I missing here? Since the HTableInterface.setAutoFlus=
h
> > > method
> > > > > > still
> > > > > > > exists, it must be anticipated that users will still want to
> > buffer
> > > > > > writes.
> > > > > > > What=E2=80=99s the recommended way to actually buffer a meani=
ngful
> number
> > > of
> > > > > > > writes, from a multithreaded context, that doesn=E2=80=99t ju=
st amount
> to
> > > > > > creating
> > > > > > > a table pool?
> > > > > > >
> > > > > > > Thanks in advance,
> > > > > > > Aaron
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-6580
> > > > > > > [2] http://hbase.apache.org/book/perf.writing.html
> > > > > > > [3]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-6580?focusedCommentId=3D13501=
302&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=
#comment-13501302
> > > > > > > =E2=80=8B
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Pi=
et
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

--089e01177631beefc5050a97313c--