lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Becker <pbec...@dstc.edu.au>
Subject Re: parallel index building & searching multiple indexes
Date Mon, 11 Aug 2003 20:43:47 GMT
Hi Tom,

Killeen, Tom wrote:

>I am attempting to create approx 10 different Lucene indexes.  I'm trying to
>create them at the same time by running multiple processes and each index is
>written to a new directory.  Once I create more than one process - the
>performance is very, very slow.  
>
As Otis said: disk access probably creates the trouble, since your 
disk's head has to be at 20 positions at once (10 indexes, 10 input 
directories). At that is assuming not too much fragmentation.

Creating them on different disks and moving them across is not a problem 
as long as you don't put any relative paths in the index yourself. 
Creating them sequentially instead of parallel might be another option 
to speed up things.

>Any sample code out there showing an efficient way to create multiple
>indexes?
>
>Also, Any sample code out there to search the multiple indexes?
>
We actually do both -- ignoring the fact that parallel might be slower 
on a single disk since we don't really know which disks are involved. 
Our program is a little GUI application running the indexing in the 
background (see http://tockit.sf.net/docco). You can find our code here:

  http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/

or to be more specific:

  
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/source/org/tockit/docco/

The bits you ask for are in the "index", "indexer" and "query" packages. 
"documenthandler" might also be interesting since it contains code to 
read text, html, xml and OpenOffice documents. There are also plugins 
for POI, PDFbox and Multivalent here:

  http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/plugins/

The code can be checked out via anonymous cvs as usual with Sourceforge:

  http://sourceforge.net/cvs/?group_id=37081

HTH,
   Peter

>
>thanks, 
>Tom
>  
>
[...unrelated quote...]



Mime
View raw message