lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2351) optimize automatonquery
Date Mon, 29 Mar 2010 08:59:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850844#action_12850844
] 

Michael McCandless commented on LUCENE-2351:
--------------------------------------------

OOOH I like this approach!!  It makes the linear decision "local", and bounds (by linearUpperBound)
the region so that we don't have to revisit the decision on every term.  And it enables efficiently
using the suffix :)

And.... it's FAST!  With this fix, the hard query (un*t) on flex is 105 QPS (best of 5, on
5 M doc wikipedia index), vs 62 QPS on trunk.  Yay :)

> optimize automatonquery
> -----------------------
>
>                 Key: LUCENE-2351
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2351
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2351.patch, LUCENE-2351.patch, LUCENE-2351_infinite.patch,
LUCENE-2351_infinite.patch
>
>
> Mike found a few cases in flex where we have some bad behavior with automatonquery.
> The problem is similar to a database query planner, where sometimes simply doing a full
table scan is faster than using an index.
> We can optimize automatonquery a little bit, and get better performance for fuzzy,wildcard,regex
queries.
> Here is a list of ideas:
> * create commonSuffixRef for infinite automata, not just really-bad linear scan cases
> * do a null check rather than populating an empty commonSuffixRef
> * localize the 'linear' case to not seek, but instead scan, when ping-ponging against
loops in the state machine
> * add a mechanism to enable/disable the terms dict cache, e.g. we can disable it for
infinite cases, and maybe fuzzy N>1 also.
> * change the use of BitSet to OpenBitSet or long[] gen for path-tracking
> * optimize the backtracking code where it says /* String is good to go as-is */, this
need not be a full run(), I think...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message