Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EADE11323 for ; Sat, 28 Jun 2014 11:11:26 +0000 (UTC) Received: (qmail 55036 invoked by uid 500); 28 Jun 2014 11:11:24 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 54956 invoked by uid 500); 28 Jun 2014 11:11:24 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 54944 invoked by uid 99); 28 Jun 2014 11:11:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jun 2014 11:11:24 +0000 Date: Sat, 28 Jun 2014 11:11:24 +0000 (UTC) From: "Jack Krupansky (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-5791) QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046813#comment-14046813 ] Jack Krupansky commented on LUCENE-5791: ---------------------------------------- At least consider clear Javadoc on limitations and performance, such as the need to keep wildcard patterns "brief". Maybe consider a limit of how many wildcards can be used in a single wildcard query. Possibly configurable. Maybe consider a "trim" mode - if too many wildcards appear, simply trim trailing portions of the pattern to get under the limit. For example, this test case might get trimmed to abc*mno*xyz*. This would still match all of the intended matches, albeit also matching some unintended cases. Maybe a limit of three wildcards would be reasonable. Does ? have the same issue, or is it much more linear? Would ???*???*???*??? be as bad as abc*mno*xyz*pqr* ? Do adjacent ** get collapsed to a single * ? > QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load > ----------------------------------------------------------------------------------- > > Key: LUCENE-5791 > URL: https://issues.apache.org/jira/browse/LUCENE-5791 > Project: Lucene - Core > Issue Type: Bug > Components: modules/queryparser > Environment: Lucene 4.7.2 > Java 6 > Reporter: Clemens Wyss > Attachments: afterdet.png > > > The following "testcase" runs endlessly and produces VERY heavy load. > ... > String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut " > + "labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et " > + "ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. " > + "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt " > + "ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores " > + "et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet"; String query = query.replaceAll( "\\s+", "*" ); try { QueryParserUtil.parse( query, new String[] { "test" }, new Occur[] { Occur.MUST }, new KeywordAnalyzer() ); } catch ( Exception e ) { Assert.fail( e.getMessage() ); } ... > I don't say this testcase makes "sense", nevertheless the question remains whether this is a bug or a "feature"? > 99% the threaddump/stacktrace looks as follows: > BasicOperations.determinize(Automaton) line: 680 > Automaton.determinize() line: 759 > SpecialOperations.getCommonSuffixBytesRef(Automaton) line: 165 > CompiledAutomaton.(Automaton, Boolean, boolean) line: 168 > CompiledAutomaton.(Automaton) line: 91 > WildcardQuery(AutomatonQuery).(Term, Automaton) line: 67 > WildcardQuery.(Term) line: 57 > WildcardQueryNodeBuilder.build(QueryNode) line: 42 > WildcardQueryNodeBuilder.build(QueryNode) line: 32 > StandardQueryTreeBuilder(QueryTreeBuilder).processNode(QueryNode, QueryBuilder) line: 186 > StandardQueryTreeBuilder(QueryTreeBuilder).process(QueryNode) line: 125 > StandardQueryTreeBuilder(QueryTreeBuilder).build(QueryNode) line: 218 > StandardQueryTreeBuilder.build(QueryNode) line: 82 > StandardQueryTreeBuilder.build(QueryNode) line: 53 > StandardQueryParser(QueryParserHelper).parse(String, String) line: 258 > StandardQueryParser.parse(String, String) line: 168 > QueryParserUtil.parse(String, String[], BooleanClause$Occur[], Analyzer) line: 119 > IndexingTest.queryParserUtilLimit() line: 1450 -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org