lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Copying a part of index and index structure
Date Fri, 20 Jun 2008 18:28:33 GMT
Anshum wrote:
> Hey Andrzej,
> Could you tell me as to what research suggests this and why is it this way?
> My calculation says the average load on each server would go down as I would
> know what server to query for an index term as opposed to querying all
> servers for terms.
> I'm looking for a solution wherein I could break up the index based any
> criteria and know what index to query for any input (and not query indexes
> that would lead to zero results).

* Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis 
Plachouras, Fabrizio Silvestri, 2007: Challenges on Distributed Web 
Retrieval: "The disadvantage of term partitioning is having to build 
initially the entire global index. This does not scale well, and it is 
not useful in actual large scale Web search engines. There are, however, 
some advantages of this approach in the query processing phase. Webber 
et al. show that term partitioning results in lower utilization of 
resources [49]. More specifically, it significantly reduces the number of 
disk accesses and the volume of data exchanged. Document partitioning 
however is still better in terms of throughput, because of an uneven 
distribution of work load in term partitioning."

* Claudine Badue, Ricardo Baeza-Yates, 2001: Distributed Query 
Processing Using Partitioned Inverted Files (note that their conclusion 
that global partitioning is more efficient than local partitioning is 
based on a crucial assumption of being able to distribute the load 
efficiently. Other papers indicate that this is a very complex issue).

* Claudine Badue, Ramurti Barbosa, Paulo Golgher: Distributed Processing 
of Conjunctive Queries. This paper evaluates the bottlenecks in an 
engine with local index partitioning.

* Justin Zobel, Alistair Moffat, 2006: Inverted Files for Text Search 

* Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio 
Silvestri, 2006: Mining Query Logs to Optimize Index Partitioning in 
Parallel Web Search Engines

* Ronny Lempel, Shlomo Moran, 2002: Optimizing Result Prefetching in Web 
Search Engines with Segmented Indices

... and quite a few other papers that I don't remember now ... please do 
a search for "distributed IR" on ACM or Citeseer.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message