lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Sun, 10 Oct 2010 05:53:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919573#action_12919573
] 

Simon Willnauer commented on LUCENE-2690:
-----------------------------------------

Guys, awesome improvements!! Here are some comments...

* In CutOffTermCollector:
{code} final BytesRefHash pendingTerms = new BytesRefHash(new ByteBlockPool(new RecyclingByteBlockAllocator()));{code}
Sice we do not reuse the allocator we don't need to use the synced one here. There is no reset
call anywhere to free the allocated blocks too. We should just use new BytesRefHash() here.


* BooleanQueryRewrite#rewrite uses a HashMap to keep track of BytesRef and TermFreqBoost.
I wonder if we should make use of the ParallelArray technique we us in the indexing chain
together with a BytesRefHash which could safe us lots of object creation and GC cost would
be lower to once MTQ gets under load. Those MTQ can create a very large amount of objects
though and this seems to be a hot spot. I currently have use-cases for direct support of something
like a ParallelArray base class in LUCENE-2186 and it seems we can use it here too.

* In FloatsUtil#nextAfter I wonder if we need the following lines:  {code}
return new Float(direction)
...
return Double.valueOf(direction).floatValue();
{code} since those methods do nothing else than a (float) direction case really.

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message