lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Indexing Tips and Hints
Date Mon, 24 Feb 2003 20:59:21 GMT
Hello,

Since you are trying this anyway, and looking for ways to improve 
indexing times... Could you perhaps try to replace use of 
java.io.RandomAccessFile in FSDirectory implementation, with the 
attached implementation? It supposedly increases I/O throughput by 
orders of magnitude, by using partial buffering.

Terry Steichen wrote:
> Mike,
> 
> By way of comparison, I've got a collection of about 50,000 XML files, each
> of which averages about 8K.  It takes about 1.25 hours to index (on a 1.8Ghz
> machine).  I use basically the standard configuration (mergeFactor, etc.)
> and I've got about 30 fields per document.  I add about 200 new ones per
> day.  I don't recall how long that it takes to index the 200 (I do it
> through a background task), but it takes a couple of minutes to merge the
> new 200 document index with the master index.
> 
> HTH,
> 
> Terry
> 
> ----- Original Message -----
> From: "Michael Barry" <mbarry@cos.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, February 24, 2003 2:00 PM
> Subject: Indexing Tips and Hints
> 
> 
> 
>>All,
>>   I'm in need of some pointers, hints or tips on indexing large
> 
> collections
> 
>>of data. I know I saw some tips on this list before but when I tried
>>searching
>>the list, I came up blank.
>>   I have a large collection of XML files (336000 files around 5K
>>apiece) that I'm
>>indexing and its taking quite a bit of time (27 hours). I've played
>>around with the
>>mergeFactor, RAMDirectories and multiple threads (X number of threads
>>indexing
>>a subset of the data and then merging the indexes at the end) but I
>>cannot seem
>>to bring the time down. I'm probably not doing these things properly but
>>from
>>what I read I believe I am.  Maybe this is the best I can do with this
>>data but I
>>would be really grateful to hear how others have tackled this same issue.
>>   As always pointers to places in the mailing list archive or other
>>places would be
>>appreciated.
>>
>>Thanks, Mike.
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 


-- 

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


Mime
View raw message