hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase Client Concerns
Date Wed, 16 Sep 2009 17:16:05 GMT
HTable is not thread-safe (See javadoc class comment).  Use an HTable per
Thread or use HTablePool.  Internally, it carries a write buffer to which
access is not synchronized (doesn't make sense if you look at the code).

Also, you might be interested, HTable in its guts has a static map that is
keyed by HBaseConfiguration instances.  The values are instances of
HConnectionManager.  HCM wraps our (Hadoop's) rpc client code.   The rpc
code maintatins a single Connection per remote server.  Requests and
responses are  multiplexed over this single Connection.  HCM adds caching of
region locations so its good sharing HCMs amongst HTables; i.e. passing in
the same HBaseConnection instance.

We're seeing an issue where if the table is big, if hundreds of client
threads either carrying their own HTable instance or using a pool where
startup can take a long time until the cache is filled with sufficient
region locations.  Its being investigated....

St.Ack


On Wed, Sep 16, 2009 at 9:05 AM, Jeyendran Balakrishnan <
jbalakrishnan@docomolabs-usa.com> wrote:

> A follow-up question, related to Barney Frank's comment that:
> "tests with 0.19 that instantiating Htable and HBaseConfuguration() had
> significant overhead i.e. >25ms."
> What are the implications of creating HTable just once for a given table
> at the start of the application/app-server, and using the reference to
> that  instantiated HTable for the duration of the app?
>
> Thanks,
> jp
>
>
> -----Original Message-----
> From: Barney Frank [mailto:barneyfranks1@gmail.com]
> Sent: Tuesday, September 15, 2009 4:41 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase Client Concerns
>
> My app will be "highly threaded" some day.  I was trying to avoid
> creating
> another thread for Hbase and use the pool instead.  About 33% of the
> requests handled in the app server will need to retrieve data from
> Hbase.  I
> was hoping to leverage the HTablePool rather than managing my own pool
> or
> creating another process that requires a thread. It seemed on my earlier
> tests with 0.19 that instantiating Htable and HBaseConfuguration() had
> significant overhead i.e. >25ms.
>
>
> I will file an issue.
>
> Thanks.
>
>
> On Tue, Sep 15, 2009 at 5:52 PM, stack <stack@duboce.net> wrote:
>
> > On Tue, Sep 15, 2009 at 3:13 PM, Barney Frank <barneyfranks1@gmail.com
> > >wrote:
> > ....
> >
> >
> > > **** This is despite the fact that I set hbase.pause to be 25 ms and
> the
> > > retries.number = 2.  ****
> > >
> > >
> > Yeah, this is down in guts of the hadoop rpc we use.  Around
> connection
> > setup it has its own config. that is not well aligned with ours (ours
> being
> > the retries and pause settings)
> >
> > The maxretriies down in ipc is
> >
> > this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10);
> >
> > Thats for an IOE other than timeout.  For timeout, it does this:
> >
> >          } catch (SocketTimeoutException toe) {
> >            /* The max number of retries is 45,
> >             * which amounts to 20s*45 = 15 minutes retries.
> >             */
> >            handleConnectionFailure(timeoutFailures++, 45, toe);
> >
> > Let me file an issue to address the above.  The retries should be our
> > retries... and in here it has a hardcoded 1000ms that instead should
> be our
> > pause.... Not hard to fix.
> >
> >
> >
> > > I restart the Master and RegionServer and then send more client
> requests
> > > through HTablePool.  It has the same "Retrying to connect to
> server:"
> > > messages.  I noticed that the port number it is using is the old
> port for
> > > the region server and not the new one assigned after the restart.
> The
> > > HbaseClient does not seem to recover unless I restart the client
> app.
> >  When
> > > I do not use HTablePool and only Htable it works fine.
> > >
> >
> >
> > We've not done work to make the pool ride over a restart.
> >
> >
> >
> > > Two issues:
> > > 1) Setting and using hbase.client.pause and
> hbase.client.retries.number
> > > parameters.  I have rarely gotten them to work.  It seems to default
> to 2
> > > sec and 10 retries no matter if I overwrite the defaults on the
> client
> > and
> > > the server.  Yes, I made sure my client doesn't have anything in the
> > > classpath it might pick-up.
> > > <property>
> > > <name>hbase.client.pause</name>
> > > <value>20</value>
> > > </property>
> > > <property>
> > > <name>hbase.client.retries.number</name>
> > > <value>2</value>
> > > </property>
> > >
> >
> >
> > Please make an issue for this and I'll investigate.  I"ve already
> added
> > note
> > to an existing HBaseClient ipc issue and will fix above items as part
> of
> > it.
> >
> >
> >
> > > 2) Running HTablePool under Pseudo mode, the client doesn't seem to
> > refresh
> > > with the new regionserver port after the master/regions are back up.
> It
> > > gets "stuck" with the info from the settings prior to the master
> goin
> > down.
> > >
> > > I would appreciate any thoughts or help.
> > >
> >
> >
> > You need to use the pool?  Your app is highly threaded and all are
> > connecting to hbase (hundreds)?
> >
> > St.Ack
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message