lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Franke <>
Subject Search performance with one index vs. many indexes
Date Fri, 25 Feb 2005 20:42:30 GMT
Topic: Search performance with large numbers of indexes vs. one large index


we are experiencing a performance problem when using large
numbers of indexes.

We have an application with about

6 Mio. Documents
one index of about 7 GB
probably 10 to 15 million different words in that

The creation of the index out of one DB (where the
documents are coming from) with two processor takes about 20 hours.

For several reasons (e.g. parallelizing the index creation), we
created several indexes, by splitting the documents into logical groups.

We first created an artifical benchmark:

10 Mio. Documents
500 Indexes (in about 3 files per index)
10 GB Index alltogether
about 5.000 randomly selected words

Querying this index took about 0.4s per query, so it was only
twice the time than querying index, which was fine for us.

We did the same with one index merged out of the 500 indexes.

The lucene search performance was fine here as well (about 0.2s per 
query on our machine).

We then implemented the "real thing" which is:

6 Mio. Documents
800 Indexes (with about 28 files per index)
about 7 GB index size
probably 10 to 15 million different words in that

We now have a query performance of 4-8 seconds per query.

The test with the real data in one index has not been finished
so far.

My questions are:

- Is the size of the "wordlist" the problem?

- Would we be a lot faster, when we have a smaller number
of files per index?

- Is 500-1000 still a reasonable number of indexes?

- Is there a more or less a linear relationship between
the number of indexes and the execution time of the query
(as all indexes have to be checked and the results have
to be merged)?

- Are there any parameters that could be configured for
that usecase?

- Should we implement any specialized classes specific to our use case?

Jochen Franke

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message