lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "仇寅" <qiuyi...@software.nju.edu.cn>
Subject Re: Does Lucene support partition-by-keyword indexing?
Date Sun, 02 Mar 2008 16:17:10 GMT
Hi Mathieu,

I hope my previous mail has explained something. My objective is just to do
some simple research and to demonstrate the feasibility, so we can leave
other options alone. And you talked about caching. Yes, that will also be my
concern. Thanks for your advice though.

On Sun, Mar 2, 2008 at 7:44 PM, Mathieu Lecarme <mathieu@garambrogne.net>
wrote:

>
> Le 2 mars 08 à 03:05, 仇寅 a écrit :
>
> > Hi,
> >
> > I agree with your point that it is easier to partition index by
> > document.
> > But the partition-by-keyword approach has much greater scalability
> > over the
> > partition-by-document approach. Each query involves communicating with
> > constant number of nodes; while partition-by-doc requires spreading
> > the
> > query a long all or many of the nodes. And I am actually doing some
> > small
> > research on this. By the way, the documents to be indexed are not
> > necessarily web pages. They are mostly files stored on each node's
> > file
> > system.
> >
> > Node failures are also handled by replicas. The index for each term
> > will be
> > replicated on multiple nodes, whose nodeIDs are near to each other.
> > This
> > mechanism is handled by the underlying DHT system.
> >
> > So any idea how can partition index by keyword in lucene? Thanks.
>
> When you read a file, and tokenize it, you dispatch token in
> differents index, with a unique Document ID.
>
> Can you explain more things about the context of your application?
>
> I don't know why you need P2P. Is it for file sharing? so, index
> should be near document.
> Is it for distributed computed? use central data and hadoop Map/Reduce.
> If you wont a cluster of lucene for heavy querying, use the rsync + mv
> trick of Technorati.
> If you persist with Term dispatching, use it only for caching. Each
> node provides a Term index of their Document. When you search
> something, the parsed query gives you every Term (I can give you code
> for that), you first ask wich node contains that Term, and after, you
> send the Query to this nodes.
>
> M.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Look before you leap
-------------------------------------------
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message