lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason rutherglen <>
Subject Re: GData, updateable IndexSearcher
Date Wed, 26 Apr 2006 17:19:55 GMT
Hi Doug,

Thanks for the info, makes sense.

> In particular, it supports scaling the number of *readers* well.

Yes this is very true and a good architecture and in fact because Java comes in 64-bit flavors
allows for a smaller number of machines as per 32-bit built C systems that have memory limitations
like the current Google architecture.  

> Yes.  Folks have developed incrementally updateable IndexSearchers before, but none is
yet part of Lucene.

Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to
become part of Lucene?  Are there any negatives to updateable IndexSearchers?  



----- Original Message ----
From: Doug Cutting <>
Sent: Tuesday, April 25, 2006 9:04:47 PM
Subject: Re: GData

jason rutherglen wrote:
> Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no?
> Couldn't this be used in Solr and distribute all the data rather than master/slave it?

It's possible to search a Lucene index that lives in Hadoop's DFS, but 
not recommended.  It's very slow.  It's much faster to copy the index to 
a local drive.

The rsync approach, of only transmitting index diffs, is a very 
efficient way to distribute an index.  In particular, it supports 
scaling the number of *readers* well.

For read/write stuff (e.g. a calendar) such scaling might not be 
paramount.  Rather, you might be happy to route all requests for a 
particular calendar to a particular server.  The index/database could 
still be somehow replicated/synced, in case that server dies, but a 
single server can probably handle all requests for a particular 
index/database.  And keeping things coherent is much simpler in this case.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message