lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-4771) Query-time join collectors could maybe be more efficient
Date Mon, 11 Feb 2013 16:37:14 GMT
Robert Muir created LUCENE-4771:
-----------------------------------

             Summary: Query-time join collectors could maybe be more efficient
                 Key: LUCENE-4771
                 URL: https://issues.apache.org/jira/browse/LUCENE-4771
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/join
            Reporter: Robert Muir


I was looking @ these collectors on LUCENE-4765 and I noticed:

* SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash
per-collect.
* MultiValued  collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just
looks up each value and adds the bytes per-collect.

I think instead its worth investigating if SV should use getTermsIndex, and both collectors
just collect-up their per-segment ords in something like a BitSet[maxOrd]. 

When asked for the terms at the end in getCollectorTerms(), they could merge these into one
BytesRefHash.

Of course, if you are going to turn around and execute the query against the same searcher
anyway (is this the typical case?), this could even be more efficient: No need to hash or
instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm()
i think... somehow :)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message