lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Indexing Tips and Hints
Date Mon, 24 Feb 2003 21:28:38 GMT
Things to consider:
- disk speed and whether it is busy satisfying other processes'
- CPU speed
- amount or free RAM in the machine and amount of RAM given to your JVM
- the bottleneck - could be a slow XML parser, for instance, profile it

I'm about to submit another Lucene article to  It talks
about the performance of indexing.  I don't know when exactly it will
be published, but when it does I'll send the URL to the list.


--- Michael Barry <> wrote:
> All,
>    I'm in need of some pointers, hints or tips on indexing large
> collections
> of data. I know I saw some tips on this list before but when I tried 
> searching
> the list, I came up blank.
>    I have a large collection of XML files (336000 files around 5K 
> apiece) that I'm
> indexing and its taking quite a bit of time (27 hours). I've played 
> around with the
> mergeFactor, RAMDirectories and multiple threads (X number of threads
> indexing
> a subset of the data and then merging the indexes at the end) but I 
> cannot seem
> to bring the time down. I'm probably not doing these things properly
> but 
> from
> what I read I believe I am.  Maybe this is the best I can do with
> this 
> data but I
> would be really grateful to hear how others have tackled this same
> issue.
>    As always pointers to places in the mailing list archive or other 
> places would be
> appreciated.
> Thanks, Mike.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message