lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Delalande, Thierry" <Thierry.Delala...@uk.daiwacm.com>
Subject RE: Short circuit AND or subquerying in lucene for performance
Date Thu, 16 Feb 2012 20:56:25 GMT
Thanks Uwe for your explanation,

Indeed that's what I understood that scanning will happen first.
Is there a way to run a subquery in Lucene, i.e. running a query only on
the result of a first query to avoid scanning the whole index ?
Is is worth forwarding this request to the developers, do you think it
is feasible to implement such a short circuit operator where the term is
"late" evaluated only if the expression to the left evaluates to true to
avoid scanning the index in its entirety ?

Thanks in advance for your help

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: 15 February 2012 21:16
To: java-user@lucene.apache.org
Subject: RE: Short circuit AND or subquerying in lucene for performance

> : Basically for queries such as field1:foo AND field2:*bar, I think it
> : would be highly beneficial to restrict evaluation of the second
field on
> : the result of the first to avoid scanning the index in its entirety
due
> : to the leading wildcard.
> 
> This is exactly how the BooleanQuery class in Lucene works.
> 
> Please note the logic in ConjunctionScorer and BooleanScorer2 (how
much
> optimizing can be done depends on wether all of the clauses are
required
or
> not)

The problem here is more the leading wildcard query. The terms are
scanned
before the scoring/result collection occurs (partly during query
rewrite,
partly as bitset before the scorer starts - depends on term density).
The
problem is that short circuiting in BS2 occurs when the wild card
bitsets
are already calculated... For wildcard queries there is no possibility
to
optimize the document collection, because *every* matching term has to
be
scanned and termdocs retrieved.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


****************************************************************
Daiwa Capital Markets Europe Limited is registered in England (registered number 01487359).
The registered office is at 5 King William Street, London EC4N 7AX. The company is authorised
and regulated by The Financial Services Authority and is a member of the London Stock Exchange.

The information contained in this E-Mail is confidential unless the sender has specifically
stated otherwise. If you are not the intended recipient please notify Daiwa Capital Markets
Europe Limited at the sender's address and delete it immediately. Communications sent by or
to any person through our computer systems may be viewed by other personnel and agents of
Daiwa Capital Markets Europe Limited . The sender does not intend by sending this message
to form a contract with the recipient, and Daiwa Capital Markets Europe Limited, its affiliates
and staff do not accept any liability for the contents of this message.

The information contained herein has been obtained from sources we believe to be reliable
but we do not represent that it is accurate or complete, and therefore, Daiwa Capital Markets
Europe Limited, its affiliates and staff cannot be held  responsible or liable for the contents
of this message. The foregoing is not an offer or solicitation to buy or sell any security,
instrument or investment. In addition Daiwa Capital Markets Europe Limited, or any affiliated
company, may have an interest, position, or effect transactions, in any investment mentioned
herein. Any opinions or recommendations expressed herein are solely those of the author or
analyst.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message