lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang (JIRA)" <>
Subject [jira] Commented: (LUCENE-2252) stored field retrieve slow
Date Sun, 07 Feb 2010 03:58:27 GMT


John Wang commented on LUCENE-2252:

bq. I still think 4 bytes/doc is too much (its too much wasted ram for virtually no gain)

That depends on the application. In modern machines (at least with the machines we are using,
e.g. a macbook pro) we can afford it :) I am not sure I agree with "virtually no gain" if
you look at the numbers I posted. IMHO, the gain is significant.

I hate to get into a subjective argument on this though.

bq. I dont understand why you need something like a custom segment file to do this, why cant
you just simply use Directory to load this particular file into memory for your use case?

Having a custom segment allows me to not having to get into this subjective argument in what
is too much memory or what is the gain, since it just depends on my application, right?

Furthermore, with the question at hand, even if we do use Directory implementation Uwe suggested,
it is not optimal. For my use case, the cost of the seek/read for the count on the data file
is very wasteful. Also even for getting position, I can just a random access into an array
compare to a in-memory seek,read/parse.

The very simple store mechanism we have written outside of lucene has a gain of >85x, yes,
8500%, over lucene stored fields. We would like to however, take advantage of the some of
the good stuff already in lucene, e.g.  merge mechanism (which is very nicely done), delete
handling etc.

> stored field retrieve slow
> --------------------------
>                 Key: LUCENE-2252
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.0
>            Reporter: John Wang
> IndexReader.document() on a stored field is rather slow. Did a simple multi-threaded
test and profiled it:
> 40+% time is spent in getting the offset from the index file
> 30+% time is spent in reading the count (e.g. number of fields to load)
> Although I ran it on my lap top where the disk isn't that great, but still seems to be
much room in improvement, e.g. load field index file into memory (for a 5M doc index, the
extra memory footprint is 20MB, peanuts comparing to other stuff being loaded)
> A related note, are there plans to have custom segments as part of flexible indexing

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message