lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCY-111) Matcher
Date Wed, 16 Jun 2010 19:24:23 GMT

    [ https://issues.apache.org/jira/browse/LUCY-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879468#action_12879468
] 

Marvin Humphrey commented on LUCY-111:
--------------------------------------

There are two interface issues in this Matcher implementation which we should
revisit some time in the future.  We should not attempt to resolve them prior
to initial release because they will require extended benchmarking and
experimentation.

First, this implementation of Matcher uses 0 as a sentinel rather than
Integer.MAX_VALUE a la Lucene.  Lucy uses 0 to represent "invalid doc id", and
Next() and Advance() return doc ids; we can treat doc id 0 as "false" to
indicate that the Matcher is exhausted -- an intuitive iterator interface:

{code:none}
while (my $doc_id = $matcher->next) {
    ...
}
{code}

Integer.MAX_VALUE was chosen for Lucene which to optimize certain constructs;
furthermore, 0 is a valid doc ID in Lucene, so using it as a sentinel isn't
an option.  For the extended discussion, see LUCENE-1614.

Second, this implementation's Collect() method uses seperately iterated
deletions.  The advantage of this strategem for now is that none of our
low-level Matchers have to worry about deletions.  However, iterated deletions
did not perform as well as random-access deletions in some benchmarks run by
Mike McCandless for Lucene in LUCENE-1476, and it makes the Matcher's iterator
somewhat more awkward to use directly if you want to avoid deletions. 

There may be more opportunities for optimization a la LUCENE-1536, as well.

> Matcher
> -------
>
>                 Key: LUCY-111
>                 URL: https://issues.apache.org/jira/browse/LUCY-111
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Core - Search
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>            Priority: Blocker
>         Attachments: Matcher.bp, Matcher.c
>
>
> A Matcher is an object which matches a set of Lucy doc ids, iterating over
> them via Next() and Advance().  It combines the roles of Lucene's
> DocIdSetIterator and Scorer classes.
> Some -- but not all -- Matchers implement a Score() method.  We can refer to
> such Matchers informally as "scorers", but Lucy won't need a Scorer class a la
> Lucene.   In Lucy, Query classes will compile down to Matchers that either
> Score() or don't.  This allows us to perform optimizations on branches of
> compound scorers: compiling "foo AND NOT bar" will produce a scoring Matcher
> for "foo" and a non-scoring Matcher for "bar", since the "bar" branch can
> never contribute to the score.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message