lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2351) optimize automatonquery
Date Mon, 29 Mar 2010 06:39:29 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2351:
--------------------------------

    Attachment: LUCENE-2351.patch

attached is a new approach:
* rids of linearmode
* adds real support for infinite automata, to prevent useless seeking.

when any loop is encountered in the state machine, a portion of the term dictionary
is calculated based on its transition ranges, and for this portion, the enumeration is 
instead driven from the terms dictionary rather than the state machine.

not really 100% on how i feel about this yet, but it seems right.

> optimize automatonquery
> -----------------------
>
>                 Key: LUCENE-2351
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2351
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2351.patch, LUCENE-2351.patch, LUCENE-2351_infinite.patch,
LUCENE-2351_infinite.patch
>
>
> Mike found a few cases in flex where we have some bad behavior with automatonquery.
> The problem is similar to a database query planner, where sometimes simply doing a full
table scan is faster than using an index.
> We can optimize automatonquery a little bit, and get better performance for fuzzy,wildcard,regex
queries.
> Here is a list of ideas:
> * create commonSuffixRef for infinite automata, not just really-bad linear scan cases
> * do a null check rather than populating an empty commonSuffixRef
> * localize the 'linear' case to not seek, but instead scan, when ping-ponging against
loops in the state machine
> * add a mechanism to enable/disable the terms dict cache, e.g. we can disable it for
infinite cases, and maybe fuzzy N>1 also.
> * change the use of BitSet to OpenBitSet or long[] gen for path-tracking
> * optimize the backtracking code where it says /* String is good to go as-is */, this
need not be a full run(), I think...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message