lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
Date Mon, 08 Aug 2011 14:44:27 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3363:
--------------------------------

    Attachment: LUCENE-3363.patch

patch for the blocktree branch:

The issue with this automaton:
* its already minimal! so all of this work is stupid.
* it only has 2096 states, but sigma=3544, this is what causes the hopcroft blowup.

Originally, AutomatonQuery minimized the automaton in its ctor, but I'm not sure we should
do this, the input automaton could be large and if someone wants to do this, they should do
it themselves? 

I think my original motivation was to try to fend off any adversaries (some crazy worst-case
crap that would make the query slow)... but I think this is obselete.

The patch changes this to determinize() + removeDeadTransitions() + reduce(), the first 2
operations really being all we need, but reduce() might help speed up the intersection.

Note: RegExp already minimizes "incrementally" during its parsing, but this is one op at a
time, so I think there is no problem here. I tested removing this too and replacing it with
det + removeDead + reduce, but it slowed down regex parsing considerably, so I think we should
continue to use minimize here.

Additionally, I optimized wildcardquery here to use the optimized concatenate() to avoid useless
determinize() calls when the LHS is a string, before it was using the concatenate(List) method.

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and
comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME:
{[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message