lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <>
Subject Re: Realtime Search for Social Networks Collaboration
Date Mon, 08 Sep 2008 12:43:45 GMT
Hi Joaquin,

Using HBase with realtime Lucene would be in line with what Google
does.  However the question is whether or not this is completely
necessary or the most simple approach.  That probably can only be
answered by doing a live comparison of the two!  Unfortunately that
would require probably quite a bit of work and resources.  For now,
Ocean stores the data in the Lucene indexes because it works, it's
easy to implement etc.  I have looked at other options, however they
need to be prioritized in terms of need vs cost.  I would put the
HBase solution possibly at the high end of the resource scale.  I
think usually it's best to keep things as simple as possible and as
cheap as possible.  More complexity in a scalable realtime search
solution would mean more people, more expertise, and more
possibilities for breakage.  It would need to be clear what HBase or
other solutions for storing the data brought to the table, which
because I don't have time to look at them, I cannot answer.
Nonetheless it is somewhat interesting.

Jason Rutherglen

On Sun, Sep 7, 2008 at 11:16 AM, J. Delgado <> wrote:
> On Sun, Sep 7, 2008 at 2:41 AM, mark harwood <>
> wrote:
>> >>for example joins are not possible using SOLR).
>> It's largely *because* Lucene doesn't do joins that it can be made to
>> scale out. I've replaced two large-scale database systems this year with
>> distributed Lucene solutions because this scale-out architecture provided
>> significantly better performance. These were "semi-structured" systems too.
>> Lucene's comparitively simplistic data model/query model is both a weakness
>> and a strength in this regard.
>  Hey, maybe the right way to go for a truly scalable and high performance
> semi-structured database is to marry HBase (Big-table like data storage)
> with SOLR/Lucene.I concur with you in the sense that simplistic data models
> coupled with high performance are the killer.
> Let me quote this from the original Bigtable paper from Google:
> " Bigtable does not support a full relational data model; instead, it
> provides clients with a simple data model that supports dynamic control over
> data layout and format, and allows clients to reason about the locality
> properties of the data represented in the underlying storage. Data is
> indexed using row and column names that can be arbitrary strings. Bigtable
> also treats data as uninterpreted strings, although clients often serialize
> various forms of structured and semi-structured data into these strings.
> Clients can control the locality of their data through careful choices in
> their schemas. Finally, Bigtable schema parameters let clients dynamically
> control whether to serve data out of memory or from disk."

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message