hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-792) Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
Date Fri, 01 Aug 2008 20:25:31 GMT

     [ https://issues.apache.org/jira/browse/HBASE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-792:

    Attachment: 792.patch

My fix for HBASE-751 introduced this issue, thinking on it.  With this in place, it could
get to a place where every request for closestAtOrBefore could end up loading all that is
out on the filesystem, all of the flushes.

We still need to rewrite this stuff; the number of seeks done per closestAtOrBefore can be
astronomical but this patch takes off some of the heat.

This patch narrows the number of possible candidates that come back.  

It goes first to the memcache to find candidate rows.

While there, it puts any deletes found between ultimate candidate and desired row into new
delete Set.  This delete set is then carried down through the walk of store files.  We add
new deletes as we encounter them so that candidates in older store files don't shine through
if they've been deleted earlier.

> Rewrite getClosestAtOrJustBefore; doesn't scale as currently written
> --------------------------------------------------------------------
>                 Key: HBASE-792
>                 URL: https://issues.apache.org/jira/browse/HBASE-792
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 792.patch
> As currently written, as a table gets bigger, the number of rows .META. needs to keep
count of grows.
> As written, our getClosestAtOrJustBefore, goes through every storefile and in each picks
up any row that could be a possible candidate for closest before.  It doesn't just get the
closest from the storefile, but all keys that are closest before.  Its not selective because
how can it tell at the store file level which of the candidates will survive deletes that
are sitting in later store files or up in memcache.
> So, if a store file has keys 0-10 and we ask to get the row that is closest or just before
7, it returns rows 0-7.. and so on per store file.
> Can bet big and slow weeding key wanted.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message