lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Scaling up to several machines with Lucene
Date Thu, 28 Jun 2007 17:38:46 GMT
Basically you need to separate your web app from your searching, for a
scalable solution. Searching is a different concern. You can develop more
kinds of search when new requirement comes in.

Technorati's way is very similar to one of DBSight configuration. One
machine is dedicated for indexing, and one or several other machines are
dedicated for searching. Searching nodes subscribe to the indexing node.
Transferring the index is pretty quick. This way scales well.

  (Database)=crawl=>(Indexing node)=replicating index=>(Searching
nodes)==>end user query

However, if your index is huge, you may need to change your index structure
to split indexing nodes into several, and one Indexing node only serves one
specific kind of index. This is kind of vertically slicing the index and
scale it.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes


On 6/28/07, Mathieu Lecarme <mathieu@garambrogne.net> wrote:
>
> Samuel LEMOINE a écrit :
> > I'm acutely interrested by this issue too, as I'm working on
> > distributed architecture of Lucene. I'm only at the very beginning of
> > my study so that I can't help you much, but Hadoop maybe could fit to
> > your requirements. It's a sub-project of Lucene aiming to parallelise
> > Lucene.
> > See http://lucene.apache.org/hadoop/about.html but I don't know wether
> > it scales well to very small clusters...
> >
> Reading from index replicated in several server is not hard, the writing
> (and locking) part is harder.
> The way choosen by technorati's guys is one computer to index, and rsync
> replication with cp and mv commit in the search cluster.
> If you need more power for indexing, then, use nutch.
>
> M.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message