lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Performance issues with the default field compression
Date Wed, 09 Apr 2014 20:38:25 GMT
Hi Alex,

Indeed, one or several (the number depends on the size of your
documents) documents need to be fully decompressed in order to read a
single field of a single document.

Regarding the stored fields visitor, the default one doesn't return
STOP when the field has been found because other fields with the same
name might be stored further in the stream of stored fields (in case
of a multivalued field). If you know that you have a single field
value, you can write your own field visitor that will return STOP
after the first value has been read. As you noted, this probably has
less impact on performance than the first point that you raised.

The default stored fields visitor is rather targeted at large indices
where compression helps save disk space and can also make stored
fields retrieval faster since a larger portion of the stored fields
can fit in the filesystem cache. However, if your index is small and
fully fits in the filesystem cache, this stored fields format might
indeed have non-negligible overhead.


On Wed, Apr 9, 2014 at 9:17 PM, Alex Parvulescu
<alexparvulescu@apache.org> wrote:
> Hi,
>
> I was investigating some performance issues and during profiling I noticed
> that there is a significant amount of time being spent decompressing fields
> which are unrelated to the actual field I'm trying to load from the lucene
> documents. In our benchmark doing mostly a simple full-test search, 40% of
> the time was lost in these parts.
>
> My code does the following: reader.document(id, Set(":path")).get(":path"),
> and this is where the fun begins :)
> I noticed 2 things, please excuse the ignorance if some of the things I
> write here are not 100% correct:
>
>  - all the fields in the document are being decompressed prior to applying
> the field filter. We've noticed this because we have a lot of content
> stored in the index, so there is an important time lost around
> decompressing junk. At one point I tried adding the field first, thinking
> this will save some work, but it doesn't look like it's doing much.
> Reference code, the visitor is only used at the very end. [0]
>
>  - second, and probably of a smaller impact would be to have the
> DocumentStoredFieldVisitor return STOP when there are no more fields in the
> visitor to visit. I only have one, and it looks like it will #skip through
> a bunch of other stuff before finishing a document. [1]
>
> thanks in advance,
> alex
>
>
> [0]
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressingStoredFieldsReader.java?view=markup#l364
>
> [1]
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/document/DocumentStoredFieldVisitor.java?view=markup#l100



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message