lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Guther" <>
Subject RE: Lucene index performance
Date Fri, 22 Jun 2007 14:00:57 GMT
Hi Li,

Sorry for taking so long to answer your questions.

We came up with splitting our index into smaller units after we realized
that we have to deal with an index of the size of many GB.  Updating and
optimizing such large files becomes a bottle neck.  We portioned our
index based on when the indexed units where created.  Updates usually
happen only on current units and rarely on units for previous years.

In terms of performance I think there is very little difference and if
stated in another response it really depends on your hardware.

All index directories are located on the same box and drive.

The documents are not distributed into several files.  I suppose you do
not talk about a Lucene document but rather about an indexed unit.  It
really depends how you organize your index but my experience is not to
split one indexed unit into parts.  When I started to index our units we
separated meta data from aggregated units, like for example a books meta
information like ISBN etc. and its pages.  Each page (or aggregated
unit) was a single Lucene document.  This made it somehow difficult to
assemble the information as the UI dictated it and we went back to treat
one unit and its aggregates as a single Lucene document which made the
reading faster.


-----Original Message-----
From: [] 
Sent: Tuesday, June 19, 2007 8:05 PM
Subject: RE: Lucene index performance

Hi Andreas,
	I am very interested in the multiple index file index/search.
Can you kindly help me on following questions?
1) Why you use multi index files? How much is the performance gain for
both indexing and searching? Someone reported that there no big
performance difference except the number if indices is huge, like 1000.
2) Are these index files located in a single machine or distributed into
multiple machines? 
3) How do you distribute the document into several index files?

Thanks a lot,

-----Original Message-----
From: Andreas Guther [] 
Sent: Monday, June 18, 2007 4:00 AM
Subject: Re: Lucene index performance

Searching on multiple index files is incredible fast.  We have 10
index folders with different sizes.  All folders together have a size of
GB.  Results come back usual within less than 50 ms.  Getting results
out of
the index i.e. reading documents is expensive and you will have to spent
time here to get a good performance.  You will need to look into
- Topdocs
- Extracting results in an ordered way, i.e. sort by index and within an
index by document id.  This will help to minimize disk head jumps and
me a tremendous boost.
- Extracting only what you need (using a special read filter I do not
the name right now and I do not have access to my sources at the moment
writing this)


On 6/17/07, Mark Miller <> wrote:
> Lee Li Bin wrote:
> > Hi,
> >
> > I would like to know how's the performance during indexing and
> of
> > results on a large index files would be like.
> >
> Fast.
> > And is it possible to create multiple index files and search across
> multiple
> > index files?
> Yes.
> >  If possible, may I know how could it be done?
> >
> Check out MultiSearcher.
> > Thanks a lot.
> >
> >
> >
> >
> >
> >
> > To unsubscribe, e-mail:
> > For additional commands, e-mail:
> >
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message