hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeyendran Balakrishnan" <jbalakrish...@docomolabs-usa.com>
Subject RE: HBase Client Concerns
Date Wed, 16 Sep 2009 19:28:10 GMT
Many thanks again.

I think I'll go initially with cached HBaseConfiguration and one new
HTable instance per request thread and accept the resulting slowness
overhead per request.

When the HTablePool pause/retry param issue is resolved, I can switch to
that. To workaround the problem of restarting the client app when the
HBase servers are restarted, I can then maybe wrap HTablePool into a
class which essentially clears the pool cache [forcing instantiating a
new HTable] when any of the HTablePool.getTable() client calls time out,
so the client app need not be restarted...

Cheers,
jp


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
stack
Sent: Wednesday, September 16, 2009 11:23 AM
To: hbase-user@hadoop.apache.org
Subject: Re: HBase Client Concerns

On Wed, Sep 16, 2009 at 11:16 AM, Jeyendran Balakrishnan <
jbalakrishnan@docomolabs-usa.com> wrote:

> Thanks a lot for the explanation!
>
> From your dessciption, the same HTable shared across all request
threads
> [in an app server application] is a no-no, and instantiating a new
> HTable for each request is slow [depending upon application
requirements
> of course]
> ==> HTablePool is a logical solution. Sounds like the situation for
> database connections in an app server.
>
>
I think the 'slow' was the one-time startup cost so don't rule out the
HTable per Thread.




> Assuming the issue you mentioned [ie., slow HTablePool startup when
> there are a lot of client threads hitting a big table] is resolved, is
> it safe to say that best practice for HTable access from an app server
> is to use HTablePool?
>


Others may have stronger opinions on this than I.  HTablePool will not
ride
over a restart of the servers according to recent issue filed by Barney
Frank (Whereas HTable per Thread will).



>
> One other question:
> Is it safe to cache a single instance of HBaseConfiguration, and then
> pass it to instantiate a new HTable for each client/request thread?
Will
> this improve HTable instantiation time?
>


Yes because you'll be using same HCM and thus same cache of region
addresses
across all instances.

St.Ack



>
> Thanks,
> jp
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Wednesday, September 16, 2009 10:16 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase Client Concerns
>
> HTable is not thread-safe (See javadoc class comment).  Use an HTable
> per
> Thread or use HTablePool.  Internally, it carries a write buffer to
> which
> access is not synchronized (doesn't make sense if you look at the
code).
>
> Also, you might be interested, HTable in its guts has a static map
that
> is
> keyed by HBaseConfiguration instances.  The values are instances of
> HConnectionManager.  HCM wraps our (Hadoop's) rpc client code.   The
rpc
> code maintatins a single Connection per remote server.  Requests and
> responses are  multiplexed over this single Connection.  HCM adds
> caching of
> region locations so its good sharing HCMs amongst HTables; i.e.
passing
> in
> the same HBaseConnection instance.
>
> We're seeing an issue where if the table is big, if hundreds of client
> threads either carrying their own HTable instance or using a pool
where
> startup can take a long time until the cache is filled with sufficient
> region locations.  Its being investigated....
>
> St.Ack
>
>
> On Wed, Sep 16, 2009 at 9:05 AM, Jeyendran Balakrishnan <
> jbalakrishnan@docomolabs-usa.com> wrote:
>
> > A follow-up question, related to Barney Frank's comment that:
> > "tests with 0.19 that instantiating Htable and HBaseConfuguration()
> had
> > significant overhead i.e. >25ms."
> > What are the implications of creating HTable just once for a given
> table
> > at the start of the application/app-server, and using the reference
to
> > that  instantiated HTable for the duration of the app?
> >
> > Thanks,
> > jp
> >
> >
> > -----Original Message-----
> > From: Barney Frank [mailto:barneyfranks1@gmail.com]
> > Sent: Tuesday, September 15, 2009 4:41 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: HBase Client Concerns
> >
> > My app will be "highly threaded" some day.  I was trying to avoid
> > creating
> > another thread for Hbase and use the pool instead.  About 33% of the
> > requests handled in the app server will need to retrieve data from
> > Hbase.  I
> > was hoping to leverage the HTablePool rather than managing my own
pool
> > or
> > creating another process that requires a thread. It seemed on my
> earlier
> > tests with 0.19 that instantiating Htable and HBaseConfuguration()
had
> > significant overhead i.e. >25ms.
> >
> >
> > I will file an issue.
> >
> > Thanks.
> >
> >
> > On Tue, Sep 15, 2009 at 5:52 PM, stack <stack@duboce.net> wrote:
> >
> > > On Tue, Sep 15, 2009 at 3:13 PM, Barney Frank
> <barneyfranks1@gmail.com
> > > >wrote:
> > > ....
> > >
> > >
> > > > **** This is despite the fact that I set hbase.pause to be 25 ms
> and
> > the
> > > > retries.number = 2.  ****
> > > >
> > > >
> > > Yeah, this is down in guts of the hadoop rpc we use.  Around
> > connection
> > > setup it has its own config. that is not well aligned with ours
> (ours
> > being
> > > the retries and pause settings)
> > >
> > > The maxretriies down in ipc is
> > >
> > > this.maxRetries = conf.getInt("ipc.client.connect.max.retries",
10);
> > >
> > > Thats for an IOE other than timeout.  For timeout, it does this:
> > >
> > >          } catch (SocketTimeoutException toe) {
> > >            /* The max number of retries is 45,
> > >             * which amounts to 20s*45 = 15 minutes retries.
> > >             */
> > >            handleConnectionFailure(timeoutFailures++, 45, toe);
> > >
> > > Let me file an issue to address the above.  The retries should be
> our
> > > retries... and in here it has a hardcoded 1000ms that instead
should
> > be our
> > > pause.... Not hard to fix.
> > >
> > >
> > >
> > > > I restart the Master and RegionServer and then send more client
> > requests
> > > > through HTablePool.  It has the same "Retrying to connect to
> > server:"
> > > > messages.  I noticed that the port number it is using is the old
> > port for
> > > > the region server and not the new one assigned after the
restart.
> > The
> > > > HbaseClient does not seem to recover unless I restart the client
> > app.
> > >  When
> > > > I do not use HTablePool and only Htable it works fine.
> > > >
> > >
> > >
> > > We've not done work to make the pool ride over a restart.
> > >
> > >
> > >
> > > > Two issues:
> > > > 1) Setting and using hbase.client.pause and
> > hbase.client.retries.number
> > > > parameters.  I have rarely gotten them to work.  It seems to
> default
> > to 2
> > > > sec and 10 retries no matter if I overwrite the defaults on the
> > client
> > > and
> > > > the server.  Yes, I made sure my client doesn't have anything in
> the
> > > > classpath it might pick-up.
> > > > <property>
> > > > <name>hbase.client.pause</name>
> > > > <value>20</value>
> > > > </property>
> > > > <property>
> > > > <name>hbase.client.retries.number</name>
> > > > <value>2</value>
> > > > </property>
> > > >
> > >
> > >
> > > Please make an issue for this and I'll investigate.  I"ve already
> > added
> > > note
> > > to an existing HBaseClient ipc issue and will fix above items as
> part
> > of
> > > it.
> > >
> > >
> > >
> > > > 2) Running HTablePool under Pseudo mode, the client doesn't seem
> to
> > > refresh
> > > > with the new regionserver port after the master/regions are back
> up.
> > It
> > > > gets "stuck" with the info from the settings prior to the master
> > goin
> > > down.
> > > >
> > > > I would appreciate any thoughts or help.
> > > >
> > >
> > >
> > > You need to use the pool?  Your app is highly threaded and all are
> > > connecting to hbase (hundreds)?
> > >
> > > St.Ack
> > >
> >
>

Mime
View raw message