lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
Date Fri, 19 Nov 2010 17:58:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933888#action_12933888
] 

Michael McCandless commented on LUCENE-2694:
--------------------------------------------

Phew that was fast!

Wow, you nuked the terms dict cache :) Nice!

Though it makes me a bit nervous... like there'll always be a risk
we've missed some path through Lucene that does two lookups...  And,
even for external reasons (eg same query arrives to Lucene, looking
for next page or something), the cache is useful.

EG, a straight TermQuery (not spawned by MTQ) is now hitting the terms
dict twice.  Once inside Sim.idfExplain, where it calls
searcher.docFreq(term), and then again to pull the scorers per sub
reader.  Probably, TermQuery should pull the PerReaderTermState, up
front, if it wasn't already handed it?  And then pass the docFreq to
Sim.idfExplain.

Should we add a PerReaderTermState.docFreq(), which just sums up
across all subs?

Does TermState really need field()?  Seems wasteful to have to store
that... eg an MTQ will store many TermStates against the same field.
I think we should keep TermState lean.

Also, I think it shouldn't need that clone method?

I think instead of duplicating docs/docsAndPositions (and soon
bulkPostings) on TermsEnum, once for TermState and once without, we
should just add a seek(TermState)?  And then the single
docs/docsAndPositions/etc. method can be used to get the enum for that
term.  (Likewise for Terms) Also, we should remove docFreq and ord
from TermsEnum since you should get it from TermState?

I think IndexReader can offer the sugar methods (that take either
BytesRef term or String field + TermState state).

Also: I tried to run the benchmark on beast but unfortunately there's
a bug somewhere (even though Lucene core tests pass) -- I see
different results for some fuzzy queries.

Nice work!!  Getting to single term lookup for all queries will be awesome!


> MTQ rewrite + weight/scorer init should be single pass
> ------------------------------------------------------
>
>                 Key: LUCENE-2694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2694
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2694.patch, LUCENE-2694.patch
>
>
> Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
> Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer
init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message