lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to get all matched terms in a PrefixQuery
Date Wed, 14 Sep 2016 18:57:59 GMT
Also please realize that PrefixQuery, at default settings, will
sometimes (often, depending on how you use it) bypass BooleanQuery and
do the "term at a time" rewrite, which foils your effort.

You could force PrefixQuery to always use BooleanQuery
(setRewriteMethod), but this can cause horrific performance if the
prefix matches many terms, and you'd have to increase BooleanQuery's
default max number of clauses limit, which is not advisable since that
would cause even more horrific performance.

> Thought you should be aware of LUCENE-6229

Yes, please go comment on the issue, if it's useful to you.  This API
is controversial because it exposes the inner workings of how queries
rewrite/score.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Sep 14, 2016 at 11:16 AM, Terry Smith <shebiki@gmail.com> wrote:
> Rajnish,
>
> Thought you should be aware of LUCENE-6229
> <https://issues.apache.org/jira/browse/LUCENE-6229> which discusses the
> possibility of removing the Scorer.getChildren API.
>
> --Terry
>
>
> On Tue, Sep 13, 2016 at 11:10 PM, Rajnish kamboj <rajnishk7.info@gmail.com>
> wrote:
>
>> Thanks Mike
>>
>> I would rather go with first approach with Scorer.getChildren API. (will
>> try).
>> The second approach I have thought of but you are right, it is costly.
>>
>> Regards
>> Raj
>>
>> On Wednesday 14 September 2016, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>> > You can't do this very easily, unfortuantely.
>> >
>> > The way PrefixQuery runs is to find (globally, across the index) all
>> > terms that have that prefix.  If there are enough of them, it goes
>> > term by term marking the documents in a bitset, and then iterates that
>> > bitset in the end.  So the information of which term matched which
>> > document is long gone.
>> >
>> > If there are few enough terms, it makes a BooleanQuery with N SHOULD
>> > clauses, and in that limited case, since the child clauses are all
>> > visiting the same document when it's collected, you might be able to
>> > use the Scorer.getChildren API in a custom Collector to see (per doc
>> > collected) which terms are "on" that one document.
>> >
>> > You could alternatively store term vectors (but these are slow and
>> > costly) and load them for each document and iterate the matched prefix
>> > terms by creating a PrefixTermsEnum.
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> >
>> > On Tue, Sep 13, 2016 at 11:25 AM, Rajnish kamboj
>> > <rajnishk7.info@gmail.com <javascript:;>> wrote:
>> > > Hi
>> > >
>> > > How can I get all matched terms of a document in PrefixQuery?
>> > >
>> > > Term t2 = new Term("contents", "br");
>> > > PrefixQuery query = new PrefixQuery(t2);
>> > >
>> > > Suppose I have few documents with 1000 different terms.
>> > > Search is showing me the document in which it find the br words.
>> > >
>> > > Now, how can I get all the br words in the document?
>> > >
>> > >
>> > >
>> > > Thanks
>> > > Raj
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message