lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Realtime & distributed
Date Fri, 09 Oct 2009 03:24:43 GMT
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric <> wrote:

> Does anyone have any recommendations?  I've looked at Katta, but it doesn't
> seem to support realtime searching.  It also uses hdfs, which I've heard can
> be slow.  I'm looking to serve 40gb of indexes and support about 1 million
> updates per day.
Hi Eric,

  As I mentioned in my response to Jason, we at LinkedIn serve our roughly
50million document profile index on a real-time distributed setup (we're
serving facets in real-time also), serving tens of millions of queries a day
in the 1-10ms latency per node, based on the open source zoie project (built
here at LinkedIn) :

  Zoie doesn't handle the distributed part of the setup, it's just the
real-time side.  Distribution is done pretty straitgtforwardly in our case
though: N shards each getting a different contiguous slice of the user base,
each replicated K times, and all N*K nodes get indexing events distributed
by a message queue independently.

  If you have any questions about zoie, let me know.  The documentation
could get filled in a little further, and it doesn't touch on distributed
side of things, so feel free to ping me.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message