hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1528) HClient for multiple tables
Date Thu, 28 Jun 2007 00:11:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508691
] 

Jim Kellerman commented on HADOOP-1528:
---------------------------------------

> If by "HBase instance" we mean a Configuration object pointing to a specific HMaster,
then one 
> HConnection per HBase instance is how i've implemented it. 

Yes, this is what I meant.

> Imagine using a single HConnection for many transactions across many (sometimes the same)
tables. 
> You don't want to "close" an HTable because another transaction on the same table may
execute 
> moments afterwards... the caching has been centralized to avoid repeated meta lookups.

Yes. However, an application usually knows when it is done accessing a table. If the table
region to server map were in the HTable object, then that cache would be dumped when the client
garbage collected the HTable object. However, since we are talking about caching the region
to server information in the HConnection object, the client needs the ability to say "ok I
am done with this table now" so that the HConnection can drop all that cached information.
For very large tables, this could consume a great amount of memory, and I would much rather
go through the work of re-opening a table than risking an OutOfMemoryException.

> Only when an HTable times out accessing an HRegion and needs to findRegion(), does it
call a 
> HConnection.closeTable() which essentially clears the cache for that table. This is followed
by a 
> synchronized HConnection.getTableServers() method which will update the cache with the
new 
> regions.

yes, this is definately needed. but see my explanation above for why we should have an HTable.close
method.



> HClient for multiple tables
> ---------------------------
>
>                 Key: HADOOP-1528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1528
>             Project: Hadoop
>          Issue Type: Task
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>
> I have an app that needs to access multiple HBase tables concurrently.  The current HClient
can only have one table open at a time even though it caches region servers of multiple tables
as they are looked up.
> This means that my application layer must open multiple HClients, one per table, perhaps
caching those HClients in a pool to reuse them (and their cached table data) as appropriate.
> or
> Shall I write an HClient patch that makes the HClient  multi-table thread-safe?
> Jim's suggestion is to implement an HClient singleton (call it HClientManager?) that
does the actual caching/resync of root/meta regions.  Individual HClients will still be one
table, one update row at a time but will rely on the singleton for the cached table info.
 We want HClients to be created and disposed as fast as possible with a minimum of meta lookups.
> Jim, what about non-root/meta regions, shouldn't they be cached and refreshed via the
singleton also?  It may still be possible that a region split/resync will occur during on
HClient session so does the HClientManager need to be able to notify the corresponding HClients
in that event?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message