lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2351) optimize automatonquery
Date Sat, 27 Mar 2010 16:28:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850545#action_12850545
] 

Michael McCandless commented on LUCENE-2351:
--------------------------------------------

The attached patch improves sneaky wildcard query "un*t" (on a 5M doc wikipedia index, matching
1058 terms --> 124623 docs) from 39.69 QPS -> 44.85 QPS (best of 5) on flex.  But trunk
is at 63.19 QPS so we still have more to do...

> optimize automatonquery
> -----------------------
>
>                 Key: LUCENE-2351
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2351
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2351.patch
>
>
> Mike found a few cases in flex where we have some bad behavior with automatonquery.
> The problem is similar to a database query planner, where sometimes simply doing a full
table scan is faster than using an index.
> We can optimize automatonquery a little bit, and get better performance for fuzzy,wildcard,regex
queries.
> Here is a list of ideas:
> * create commonSuffixRef for infinite automata, not just really-bad linear scan cases
> * do a null check rather than populating an empty commonSuffixRef
> * localize the 'linear' case to not seek, but instead scan, when ping-ponging against
loops in the state machine
> * add a mechanism to enable/disable the terms dict cache, e.g. we can disable it for
infinite cases, and maybe fuzzy N>1 also.
> * change the use of BitSet to OpenBitSet or long[] gen for path-tracking
> * optimize the backtracking code where it says /* String is good to go as-is */, this
need not be a full run(), I think...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message