hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject Re: HBase and Lucene for realtime search
Date Sun, 13 Feb 2011 17:37:17 GMT
On Fri, Feb 11, 2011 at 4:13 PM, Ted Dunning <tdunning@maprtech.com> wrote:

> On Fri, Feb 11, 2011 at 3:50 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
> > > I can't imagine that the speed achieved by using Hbase would be even
> > within
> > > orders of magnitude of what you can do in Lucene 4 (or even 3).
> >
> > The indexing speed in Lucene hasn't changed in quite a while, are you
> > saying HBase would somehow be overloaded?  That doesn't seem to jive
> > with the sequential writes HBase performs?
> >
>
> Michi's stuff uses flexible indexing with a zero lock architecture.  The
> speed *is* much higher.
>
> The real problem is that hbase repeats keys.
>
> If you were to store entire posting vectors as values with terms as keys,
> you might be OK.  Very long posting vectors or add-ons could be added using
> a key+serial number trick.
>
> Short queries would involve reading and merging several posting vectors.
>  In
> that mode, query speeds might be OK, but there isn't a lot of Lucene left
> at
> that point.  For updates, speed would only be acceptable if you batch up a
> lot updates or possibly if you build in a value append function as a
> co-processor.
>

"speed would only be acceptable if you batch up " -- I understand what you
are talking about here (without batching-up, HBase simply become very
sluggish). Can you comment if Cassandra needs a batch-up mode? (I recall
Twitter said they just keep putting results into Cassandra for its analytics
application)


>
>
>
> > The speed of indexing is a function of creating segments, with
> > flexible indexing, the underlying segment files (and postings) may be
> > significantly altered from the default file structures, eg, placed
> > into HBase in various ways.  The posting lists could even be split
> > along with HBase regions?
> >
>
> Possibly.  But if you use term + counter and post vectors of limited length
> you might be OK.
>



-- 
--Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message