lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Realtime & distributed
Date Sat, 10 Oct 2009 05:35:11 GMT
Hi Mike,

  Zoie itself doesn't do anything with the new with the distributed
side of things - it just plays nicely with it.   Zoie, at its core,
exposes a couple of primary interfaces (well, this is a slightly
simplified form of them) :

  interface IndexReaderFactory {  List getIndexReaders(); }, and
  interface DataConsumer{ void consume(Collection events); }

To do distributed realtime search with zoie, you just need to
make sure you get your indexing events to each of your nodes
as fast as they show up, push them in through the DataConsumer
API, and IndexReaders exposed through getIndexReaders() are
then a fresh realtime read-only view on the index on each node.

Doing distributed search with a setup like this now means just
pushing your Query to all of the nodes, returning the top n hits
from each back to a broker, sorting all n * num_nodes results
by score and taking the top n of that combined list.

Depending on your system's setup, you either push events to
the nodes, or pull events from somewhere to them, but if you do
the latter the realtimeliness will be bounded by how often you
poll, of course.

  -jake

On Fri, Oct 9, 2009 at 9:09 PM, Michael Masters <mmasters@gmail.com> wrote:

> Hi Jake,
>
> Zoie looks like a a really cool project. I'd like to learn more about
> the distributed part of the setup. Any way you could describe that
> here or on the wiki?
>
> -Mike
>
> On Thu, Oct 8, 2009 at 9:24 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> > On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric <eangel@business.com> wrote:
> >
> >>
> >> Does anyone have any recommendations?  I've looked at Katta, but it
> doesn't
> >> seem to support realtime searching.  It also uses hdfs, which I've heard
> can
> >> be slow.  I'm looking to serve 40gb of indexes and support about 1
> million
> >> updates per day.
> >>
> >>
> > Hi Eric,
> >
> >  As I mentioned in my response to Jason, we at LinkedIn serve our roughly
> > 50million document profile index on a real-time distributed setup (we're
> > serving facets in real-time also), serving tens of millions of queries a
> day
> > in the 1-10ms latency per node, based on the open source zoie project
> (built
> > here at LinkedIn) : http://zoie.googlecode.com
> >
> >  Zoie doesn't handle the distributed part of the setup, it's just the
> > real-time side.  Distribution is done pretty straitgtforwardly in our
> case
> > though: N shards each getting a different contiguous slice of the user
> base,
> > each replicated K times, and all N*K nodes get indexing events
> distributed
> > by a message queue independently.
> >
> >  If you have any questions about zoie, let me know.  The documentation
> > could get filled in a little further, and it doesn't touch on distributed
> > side of things, so feel free to ping me.
> >
> >  -jake
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message