lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <>
Subject Re: Indexing Tips and Hints
Date Mon, 24 Feb 2003 19:20:47 GMT

By way of comparison, I've got a collection of about 50,000 XML files, each
of which averages about 8K.  It takes about 1.25 hours to index (on a 1.8Ghz
machine).  I use basically the standard configuration (mergeFactor, etc.)
and I've got about 30 fields per document.  I add about 200 new ones per
day.  I don't recall how long that it takes to index the 200 (I do it
through a background task), but it takes a couple of minutes to merge the
new 200 document index with the master index.



----- Original Message -----
From: "Michael Barry" <>
To: "Lucene Users List" <>
Sent: Monday, February 24, 2003 2:00 PM
Subject: Indexing Tips and Hints

> All,
>    I'm in need of some pointers, hints or tips on indexing large
> of data. I know I saw some tips on this list before but when I tried
> searching
> the list, I came up blank.
>    I have a large collection of XML files (336000 files around 5K
> apiece) that I'm
> indexing and its taking quite a bit of time (27 hours). I've played
> around with the
> mergeFactor, RAMDirectories and multiple threads (X number of threads
> indexing
> a subset of the data and then merging the indexes at the end) but I
> cannot seem
> to bring the time down. I'm probably not doing these things properly but
> from
> what I read I believe I am.  Maybe this is the best I can do with this
> data but I
> would be really grateful to hear how others have tackled this same issue.
>    As always pointers to places in the mailing list archive or other
> places would be
> appreciated.
> Thanks, Mike.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message