thanks. it's great.
On Tue, Apr 17, 2012 at 8:16 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:
> Hello,
>
> I can't help with the particular question, but can share some experience.
> My task is roughly the same I've found the patch
> https://issues.apache.org/jira/browse/LUCENE-2686 is absolutely useful
> (with one small addition, I'll post it in comments soon). By using it I
> have disjunction summing query with steady subscorers.
>
> Regards
>
> On Tue, Apr 17, 2012 at 2:37 PM, Li Li wrote:
>
>> hi all,
>> I am now hacking the BooleanScorer2 to let it keep the docID() of the
>> leaf scorer(mostly possible TermScorer) the same as the top-level Scorer.
>> Why I want to do this is: When I Collect a doc, I want to know which term
>> is matched(especially for BooleanClause whose Occur is SHOULD). we have
>> discussed some solutions, such as adding bit masks in disjunction scorers.
>> with this method, when we finds a matched doc, we can recursively find
>> which leaf scorer is matched. But we think it's not very efficient and not
>> convenient to use(this is my proposal but not agreed by others in our
>> team). and then we came up with another one: Modifying DisjunctionSumScorer.
>> we analysed the codes and found that the only Scorers used by
>> BooleanScorer2 that will make the children scorers' docID() not equal to
>> parent is an anonymous class inherited from DisjunctionSumScorer. All other
>> ones including SingleMatchScorer, countingConjunctionSumScorer(anonymous),
>> dualConjuctionSumScorer, ReqOptSumScorer and ReqExclScorer are fit our need.
>> The implementation algorithm of DisjunctionSumScorer use a heap to
>> find the smallest doc. after finding a matched doc, the currentDoc is the
>> matched doc and all the scorers in the heap will call nextDoc() so all of
>> the scorers' current docID the nextDoc of currentDoc. if there are N level
>> DisjunctionSumScorer, the leaf scorer's current doc is the n-th next docId
>> of the root of the scorer tree.
>> So we modify the DisjuctionSumScorer and let it behavior as we
>> expected. And then I wrote some TestCase and it works well. And also I
>> wrote some random generated TermScorer and compared the nextDoc(),score()
>> and advance(int) method of original DisjunctionSumScorer and modified one.
>> nextDoc() and score() and exactly the same. But for advance(int target), we
>> found some interesting and strange things.
>> at the beginning, I think if target is less than current docID, it
>> will just return current docID and do nothing. this assumption let my
>> algorithm go wrong. Then I read the codes of TermScorer and found each call
>> of advance(int) of TermScorer will call nextDoc() no matter whether current
>> docID is larger than target or not.
>> So I am confused and then read the javadoc of DocIdSetIterator:
>> ----------------- javadoc of DocIdSetIterator.advance(int
>> target)-------------
>>
>> int org.apache.lucene.search.DocIdSetIterator.advance(int target) throws
>> IOException
>>
>> Advances to the first beyond (see NOTE below) the current whose document
>> number is greater than or equal
>> to target. Returns the current document number or NO_MORE_DOCS if there
>> are no more docs in the set.
>> Behaves as if written:
>> int advance(int target) {
>> int doc;
>> while ((doc = nextDoc()) < target) {
>> }
>> return doc;
>> }
>> Some implementations are considerably more efficient than that.
>> NOTE: when target < current implementations may opt not to advance beyond
>> their current docID().
>> NOTE: this method may be called with NO_MORE_DOCS for efficiency by some
>> Scorers. If your
>> implementation cannot efficiently determine that it should exhaust, it
>> is recommended that you check for
>> that value in each call to this method.
>> NOTE: after the iterator has exhausted you should not call this method,
>> as it may result in unpredicted
>> behavior.
>> --------------------------------------
>> Then I modified my algorithm again and found that
>> DisjunctionSumScorer.advance(int target) has some strange behavior. most of
>> the cases, it will return currentDoc if target < currentDoc. but in some
>> boundary condition, it will not.
>> it's not a bug but let me sad. I thought my algorithm has some bug
>> because it's advance method is not exactly the same as original
>> DisjunctionSumScorer's.
>> ----codes of DisjunctionSumScorer---
>> @Override
>> public int advance(int target) throws IOException {
>> if (scorerDocQueue.size() < minimumNrMatchers) {
>> return currentDoc = NO_MORE_DOCS;
>> }
>> if (target <= currentDoc) {
>> return currentDoc;
>> }
>> ....
>> -------------------
>> for most case if (target <= currentDoc) it will return currentDoc;
>> but if previous advance will make sub scorers exhausted, then if may
>> return NO_MORE_DOCS
>> an example is:
>> currentDoc=-1
>> minimumNrMatchers=1
>> subScorers:
>> TermScorer: docIds: [1, 2, 6]
>> TermScorer: docIds: [2, 4]
>> after first call advance(5)
>> currentDoc=6
>> only first scorer is now in the heap, scorerDocQueue.size()==1
>> then call advance(6)
>> because scorerDocQueue.size() < minimumNrMatchers, it just return
>> NO_MORE_DOCS
>>
>> My question is why the advance(int target) method is defined like this?
>> for the reason of efficient or any other reasons?
>>
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
>
>
>
>