hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Aleksandrovsky <balek...@gmail.com>
Subject Re: HBase is very slow on full table scan
Date Tue, 09 Feb 2010 00:18:54 GMT
Sure, Michael! I have a post table which contains a column "keyphrases"
which is fairly sparsely distributed across the rows in the table; meaning
most posts do not have keyphrases. I then have a requirement in that for any
query against our index which might returns many results (usually 0(1000)) I
need to quickly retrieve all keyphrases for all posts which meet the query.
I do not want to issue thousands of calls to HBase (and have the information
returned in a few seconds at most), so I am building a bloom filter which
will test if the post has keyphrases and only if it answers in the
affirmative, only then I will access HBase. Given then < 1% of posts have
keyphrases this cuts the access time by 2 orders of magnitude.

On Mon, Feb 8, 2010 at 4:07 PM, Stack <stack@duboce.net> wrote:

> On Mon, Feb 8, 2010 at 3:04 PM, Boris Aleksandrovsky <baleksan@gmail.com>
> wrote:
> > Thanks. This is a one-time scan (per server runtime) in order to build
> > bloomfilters to speed up access to that table; so definitely not in the
> > query runtime :-)
>
> Can you say more about the above project of yours Boris?  It sounds
> interesting.
> St.Ack
>



-- 
Thanks,

Boris
http://twitter.com/baleksan
http://www.linkedin.com/in/baleksan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message