lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Re[2]: OutOfMemory with search(Query, Sort)
Date Wed, 05 Apr 2006 00:29:47 GMT

: > sort by filePath field which can be 100 bytes at average meaning 400M
: > RAM for the cache
: Well, it's probably not quite that bad...
: For string sorting, a FieldCache.StringIndex is used.
: It contains a sorted String[num_unique_terms_in_field], and an int[maxDoc]
: So if 10 documents share a large string field value, that value will
: only be in the fieldCache once.

yeah, but in his case he's dealing with filepaths -- i'm guessing that
each document represents a file, and no two files will have the same path.

some benefit may be gained in spliting the filepath field up into a
dirpath field and a filename field, and then sortinging on "dirpath,
filename" .. this should reduce the size quite a bit if the number of
unique files is significantly greater then the number of unique
directories -- of course how much it helps also depends greatly on wether
your fielnames are really long compared to your directory paths.

In general, Yonik's suggestion is really the best way to go.  as i
understand it the only reason the StringIndex FieldCache maintains the
list of strings permenantly is for use in a MultiSearcher, so if you
aren't worried about that i think it would work very nicely.

it would also make a really great PATCH, espcially if IndexSearcher got a
new option that let you tell it to use this new scaled down
Sort/FieldCache option for strings because you weren't using it in a

: If you are just using an IndexSearcher (no multisearchers), then the
: String[] isn't strictly needed... only the ordering (the int[]) is
: needed from the StringIndex.  One option is to create your own
: FieldCache that doesn't create/store that String[].


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message