hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Advice for efficiently scanning for modified-since
Date Fri, 16 Jul 2010 16:34:53 GMT
This request has come up a few times now.  We should dev a soln (I've
made an issue to track it -- HBASE-2839).

You could try scanning whole table but my guess is this will prove too
slow -- if not now, then later when your table grows.

If results returned will be few, a scan that ran in parallel rather
than in series tripping over each table region might make sense.
https://issues.apache.org/jira/browse/HBASE-1935 discusses this and
even has a patch though I'm sure it well stale at this point.


On Fri, Jul 16, 2010 at 8:16 AM, Mark Laffoon
<mlaffoon@semanticresearch.com> wrote:
> We have an existing product sitting on the hbase/hadoop ecosystem. We have
> laid our object model on HBase: we have one table with a row per object,
> and a separate table with composite index rows. Works great. We can
> efficiently find our objects based on their type, relationships, etc. by
> scanning the index table. We *never* scan the main table (except when
> rebuilding the index).
> A new requirement just came in: get a list of all objects that have been
> modified since <timestamp>. This has to happen "quickly" (user time).
> If we scan the main table with a timestamp restriction, will that be
> efficient? Or do we have to introduce a new composite index that has the
> last modified timestamp as part of it and scan that?
> Thanks,
> Mark

View raw message