lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1596) optimize MultiTermEnum/MultiTermDocs
Date Sat, 11 Apr 2009 21:13:15 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yonik Seeley updated LUCENE-1596:
---------------------------------

    Attachment: LUCENE-1596.patch

Attaching optimization patch.  Results up front:
  random seeks to common terms with term enumerator:  58% improvement
  full iteration over all docs matching relatively unique terms: 1595% improvement

The optimizations:
 - MultiTermEnum keeps track of which segments match... if termDocs.seek(termEnum) is used,
then MultiTermDocs will only visit segments that matched the term.
 - MultiTermEnum defers calling next() on sub enumerators until needed.  This allows MultiTermDocs
to use the faster seek(enum) since the enumerator is still on the correct term.  This also
avoids unnecessary calls to next() that may never be used, as well as unnecessary insertions
into the priority queue.

Test index: this was obviously stacked to show best-case performance for these optimizations.
 999,999 documents with maxBufferedDocs=10, resulting in 46 segments.  The full iteration
test used relatively unique terms (1 or 2 docs matching each), and the random seeks test used
very common terms (if rare terms are used in this test, the initial seek dominates and swamps
any improvement from the deferral of calls to next().)


> optimize MultiTermEnum/MultiTermDocs
> ------------------------------------
>
>                 Key: LUCENE-1596
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1596
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>         Attachments: LUCENE-1596.patch
>
>
> Optimize MultiTermEnum and MultiTermDocs to avoid seeks on TermDocs that don't match
the term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message