lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <>
Subject Re: Does Lucene support partition-by-keyword indexing?
Date Sun, 02 Mar 2008 11:44:52 GMT

Le 2 mars 08 à 03:05, 仇寅 a écrit :

> Hi,
> I agree with your point that it is easier to partition index by  
> document.
> But the partition-by-keyword approach has much greater scalability  
> over the
> partition-by-document approach. Each query involves communicating with
> constant number of nodes; while partition-by-doc requires spreading  
> the
> query a long all or many of the nodes. And I am actually doing some  
> small
> research on this. By the way, the documents to be indexed are not
> necessarily web pages. They are mostly files stored on each node's  
> file
> system.
> Node failures are also handled by replicas. The index for each term  
> will be
> replicated on multiple nodes, whose nodeIDs are near to each other.  
> This
> mechanism is handled by the underlying DHT system.
> So any idea how can partition index by keyword in lucene? Thanks.

When you read a file, and tokenize it, you dispatch token in  
differents index, with a unique Document ID.

Can you explain more things about the context of your application?

I don't know why you need P2P. Is it for file sharing? so, index  
should be near document.
Is it for distributed computed? use central data and hadoop Map/Reduce.
If you wont a cluster of lucene for heavy querying, use the rsync + mv  
trick of Technorati.
If you persist with Term dispatching, use it only for caching. Each  
node provides a Term index of their Document. When you search  
something, the parsed query gives you every Term (I can give you code  
for that), you first ask wich node contains that Term, and after, you  
send the Query to this nodes.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message