lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load
Date Fri, 27 Jun 2014 02:13:29 GMT
The test case is "only" parsing this query, not trying to run it,
right?  So it doesn't involve automaton/FST ... just the flexible
query parser code?

It seems bad that flexible QP would take so long, even if the query is
"strange".

Can you open an issue, and maybe attach a thread dump so we can see
where it's spending its time?  Thanks.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jun 26, 2014 at 5:30 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> I suspect you're getting leading wildcard searches as well, which must
> do entire term scans unless you're doing the reverse trick.
>
> Replacing all successive whitespace gives you:
> Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magna*aliquyam*erat,*sed*diam*voluptua.*At*vero*eos*et*accusam*et*justo*duo*dolores*et*ea*rebum.*Stet*clita*kasd*gubergren,*no*sea*takimata*sanctus*est*Lorem*ipsum*dolor*sit*amet.*Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magna*aliquyam*erat,*sed*diam*voluptua.*At*vero*eos*et*accusam*et*justo*duo*dolores*et*ea*rebum.*Stet*clita*kasd*gubergren,*no*sea*takimata*sanctus*est*Lorem*ipsum*dolor*sit*amet
>
> Note, no spaces. Then you're pushing it through the KeywordTokenizer
> which does essentially nothing. What a term!
>
> Your point is valid however, why this is taking so long I don't quite
> know. But I tend to agree that it's such an edge case that the
> hard-core FST guys would look at it for curiosity's sake only....
>
> Best,
> Erick
>
>
> On Thu, Jun 26, 2014 at 5:34 AM, Jack Krupansky <jack@basetechnology.com> wrote:
>> I'll defer the the hard-core Lucene committers for the technical details,
>> but I would suggest that a very large term with dozens of wildcards is a
>> "known limitation" (albeit not well-documented.) IOW, to use wildcards in
>> Lucene in a performant manner, they need to be "brief".
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Clemens Wyss DEV
>> Sent: Thursday, June 26, 2014 3:17 AM
>> To: java-user@lucene.apache.org
>> Subject: QueryParserUtil, big query with wildcards -> runs endlessly and
>> produces heavy load
>>
>>
>> The following "testcase" runs endlessly and produces VERY heavy load.
>> ...
>> String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
>> diam nonumy eirmod tempor invidunt ut "
>> + "labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et
>> accusam et justo duo dolores et "
>> + "ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem
>> ipsum dolor sit amet. "
>> + "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
>> eirmod tempor invidunt "
>> + "ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos
>> et accusam et justo duo dolores "
>> + "et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem
>> ipsum dolor sit amet";
>> String query  = query.replaceAll( "\\s+", "*" );
>> try
>> {
>> QueryParserUtil.parse( query, new String[] { "test" }, new Occur[] {
>> Occur.MUST }, new KeywordAnalyzer() );
>> }
>> catch ( Exception e )
>> {
>> Assert.fail( e.getMessage() );
>> }
>> ...
>> I don't say this testcase makes "sense", nevertheless the question remains
>> whether this is a bug or a "feature"?
>>
>> Context: Lucene 4.7.2, Java 6
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message