hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: HBase and Lucene for realtime search
Date Fri, 11 Feb 2011 23:27:20 GMT
Jason,

I can't imagine that the speed achieved by using Hbase would be even within
orders of magnitude of what you can do in Lucene 4 (or even 3).

For reference, I think that Michi Busch's search based on flexible indexing
is able to handle >10,000 inserts and >40,000 searches per second on a
laptop.  Each search involves a number of scans of posting vectors so this
is roughly equivalent to >100,000 scans per second (on a single host).

The rumor is that the insert speed is so high that it is quickly to re-index
500 million documents than to load an index.

I don't think that hbase is intended to be anywhere near this kind of speed.


On Fri, Feb 11, 2011 at 3:10 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Hello,
>
> I'm curious as to what a 'good' approach would be for implementing
> search in HBase (using Lucene) with the end goal being the integration
> of realtime search into HBase.  I think the use case makes sense as
> HBase is realtime and has a write-ahead log, performs automatic
> partitioning, splitting of data, failover, redundancy, etc.  These are
> all things Lucene does not have out of the box, that we'd essentially
> get for 'free'.
>
> For starters: Where would be the right place to store Lucene segments
> or postings?  Eg, we need to be able to efficiently perform a linear
> iteration of the per-term posting list(s).
>
> Thanks!
>
> Jason Rutherglen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message