hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration
Date Sat, 30 Apr 2011 03:48:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027281#comment-13027281

M. C. Srivas commented on HBASE-3777:

bq. The thing is that a HConnection's behavior is determined not just by the server-side cluster
it goes against, but also its client-side properties, such as "hbase.client.retries.number",
"hbase.client.prefetch.limit", and so on. Ergo, we really need a different connection for
every unique set of connection-specific config properties, whether it be client- or server-specific.

I am beginning to understand the reasons behind taking this approach. Thanks for explaining.

bq. As per the ZK/HBase use cases wiki, in theory we can have multiple masters registered
with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we can presuppose what hmaster
we'll be going to at any given point in time.

Even in the presence of multiple hmasters, does it really matter if we connect back to the
same hmaster? It probably is important for the hmasters themselves which hmaster they connect
to (and perhaps for region-servers as well). But it should not matter for clients. Agree?
 (of course, I am stating all this without knowing any details about Hbase, so don't kill
me for it).

bq. The whole purpose of this patch was to reduce the number of connections by reusing them
to the extent possible. At one point, the config's equals method was treated as the key to
the connection, which promoted reuse to some extent, but started breaking down if the config
was changed after the fact. Currently, the config's identity (object reference) is treated
as the key, but that suffers from connection overload. Hopefully, the HConnectionKey defined
in the HCM will serve as a happy medium between the two ends of the spectrum.

Ted Yu pointed out the work being done here, so I started reading the JIRA. I am not familiar
with where/how the HConnection instance gets used, and this JIRA was pretty long to understand
with the code changes and all.

I started to comment on this Jira due to the problems we faced trying to scale up the YCSB
benchmark. We tried to run about 500 threads in the YCSB HBase client, and ran out of connections
to ZK. It was a complete, unexpected, surprise that the HBase client needed to maintain multiple
connections to ZK, and it seemed to be using one per thread (ie, per HTable).

We share the same goal: with this patch, we hope to be able to scale YCSB to 50 client machines,
with 500 threads per client, and see how HBase holds up.

Would you agree, that in the long run, the HBase client should use ZK only to find the hmaster
and region-servers, but not keep the connection to ZK open? Otherwise ZK may go under as we
try to scale the number of HBase clients.

> Redefine Identity Of HBase Configuration
> ----------------------------------------
>                 Key: HBASE-3777
>                 URL: https://issues.apache.org/jira/browse/HBASE-3777
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, ipc
>    Affects Versions: 0.90.2
>            Reporter: Karthick Sankarachary
>            Assignee: Karthick Sankarachary
>            Priority: Minor
>             Fix For: 0.92.0
>         Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, HBASE-3777-V3.patch, HBASE-3777-V4.patch,
HBASE-3777-V6.patch, HBASE-3777.patch
> Judging from the javadoc in {{HConnectionManager}}, sharing connections across multiple
clients going to the same cluster is supposedly a good thing. However, the fact that there
is a one-to-one mapping between a configuration and connection instance, kind of works against
that goal. Specifically, when you create {{HTable}} instances using a given {{Configuration}}
instance and a copy thereof, we end up with two distinct {{HConnection}} instances under the
covers. Is this really expected behavior, especially given that the configuration instance
gets cloned a lot?
> Here, I'd like to play devil's advocate and propose that we "deep-compare" {{HBaseConfiguration}}
instances, so that multiple {{HBaseConfiguration}} instances that have the same properties
map to the same {{HConnection}} instance. In case one is "concerned that a single {{HConnection}}
is insufficient for sharing amongst clients",  to quote the javadoc, then one should be able
to mark a given {{HBaseConfiguration}} instance as being "uniquely identifiable".
> Note that "sharing connections makes clean up of {{HConnection}} instances a little awkward",
unless of course, you apply the change described in HBASE-3766.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message