hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2443) [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list
Date Mon, 31 Dec 2007 22:58:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555151

Bryan Duxbury commented on HADOOP-2443:

I think I have a pretty good strategy in mind, but I've sort of hit a snag. 

The problem is that when you're trying to determine which region a key lives in, the list
of regions is keyed by the region's start key, and you're looking for the region that is closest
to (without exceeding) the search key. That is, if you have regions [0, 10, 20] and you're
trying to figure out where key 15 lives, you want the answer to be 10. Then data is all there
for this sort of thing, but the interface of MapFile, which is where the .META. table is ultimately
stored, doesn't have anything like this. 

What we need essentially is a method just like getClosest, but that returns the last key found
after the one the goes over the search key. The alternative is doing a linear search of some
sort manually, which would be pretty inefficient.

> [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list
> --------------------------------------------------------------------------------
>                 Key: HADOOP-2443
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2443
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Bryan Duxbury
> Currently, when the client gets a NotServingRegionException -- usually because its in
middle of being split or there has been a regionserver crash and region is being moved elsewhere
-- the client does a complete refresh of its cache of region locations for a table.
> Chatting with Jim about a Paul Saab upload issue from Saturday night, when tables are
big comprised of regions that are splitting fast (because of bulk upload), its unlikely a
client will ever be able to obtain a stable list of all region locations.  Given that any
update or scan requires that the list of all regions be in place before it proceeds, this
can get in the way of the client succeeding when the cluster is under load.
> Chatting, we figure that it better the client holds a lazy region cache: on NSRE, figure
out where that region has gone only and update the client-side cache for that entry only rather
than throw out all we know of a table every time.
> Hopefully this will fix the issue PS was experiencing where during intense upload, he
was unable to get/scan/hql the same table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message