lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <>
Subject [jira] Commented: (LUCENE-509) Performance optimization when retrieving a single field from a document
Date Sun, 09 Jul 2006 17:23:30 GMT
    [ ] 

Chuck Williams commented on LUCENE-509:

LUCENE-545 does resolve this in a more general way, although the code to get precisely one
field value efficiently is slightly clunky, requiring something like this (for a single-valued

final seekfield = retrievefield.intern();
String value = reader.document(doc, new FieldSelector(){
    FieldSelectorResult accept(String field) {
        if (field==seekfield)
            return FieldSelectorResult.LOAD_AND_BREAK;
        else return FieldSelectorResult.NO_LOAD;

Even with this, a Document, a Field and a FieldSelector are created unnecessarily.

There are important cases where fast single-field-access is important.  E.g., I have cases
where it is necessary to obtain the id field for all results of a query, leading to (an obviously
refactored version of) the above code in a HitCollector.

I think some special optimization for the single-field access case makes sense if benchmarks
show it is material, but that it should be integrated with the mechanism of LUCENE-545.



> Performance optimization when retrieving a single field from a document
> -----------------------------------------------------------------------
>          Key: LUCENE-509
>          URL:
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Index
>     Versions: 1.9, 2.0.0
>     Reporter: Steven Tamm
>     Assignee: Otis Gospodnetic
>  Attachments: DocField.patch, DocField_2.patch, DocField_3.patch, DocField_4.patch, DocField_4b.patch
> If you just want to retrieve a single field from a Document, the only way to do it is
to retrieve all the fields from the Document and then search it.  This patch is an optimization
that allows you retrieve a specific field from a document without instantiating a lot of field
and string objects.  This reduces our memory consumption on a per query basis by around around
20% when a lot of documents are returned.
> I've added a lot of comments saying you should only call it if you only ever need one
field.  There's also a unit test.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message