lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Filtering question
Date Wed, 11 Mar 2015 22:48:36 GMT
Hi,

> Thanks for the suggestion, I tried to use a BooleanQuery with clause1 =
> termquery and clause2 = ConstantScoreQuery(MyNDVFilter), joined by
> SHOULD. I also applied the term filter at the top level (as before).
> Unfortunately it doesn't work in that the MyNDVFilter still receives null
> acceptDocs and therefore has no option but to scan the whole index.

Sorry,
I just noticed, you are using TermFilter not TermsFilter: This one does not support random
access (using bits()). Because of this the filtered docs cannot be passed down using acceptDocs.
The should clause in addition causes that the ConstantScoreQuery has to try all documents
because there is nothing else that could drive the query.

An alternative approach would be (in Lucene 4.10 or 5.0) to add the TermFilter as ConstantScoreFilter(TermQuery)
with boost=0 to the BooleanQuery. In that case it can drive the query and does not affect
scoring. In later Lucene versions you may use the new BooleanQuery.Occur type "FILTER" which
can add any query as filter. Filters will be deprecated once this is ready.

> My goal is to slowly transform a particular field from StringField to
> BinaryDocValues so that during the transition a doc may hold the value either
> in the old location or the new. Therefore a query must be able to say
>     oldField:"foo" OR newField:"foo"
> Where oldField is a StringField and newField is a BinaryDocValues.

Why do you want to do this. If you want to query like this on the field, it is a bad idea
to use DocValues. If you want to use DocValues in addition for something else, you should
place both in your index: the indexed term for queries/filters and the docvalues for e.g.
sorting / whatever... You can use the same field name for both.

> I must add that a full reindex all in one go is currently not an option, so the
> solution must support this mixed mode.

Uwe

> Any thoughts on how this could be best achieved ..?
> 
> Thanks
> 
> Chris
> 
> Sent from my iPhone
> 
> > On 11 Mar 2015, at 19:15, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > Hi,
> >
> > In fact the FilteredQuery(MatchAllDocsQuery,...) with the filter
> > should have been rewritten to a ConstantScoreQuery already, but for
> > some unknown reason, Mike McCandless removed it in
> > https://issues.apache.org/jira/browse/LUCENE-5418
> > Because of this it's better to do it like I said before (use
> ConstantScoreQuery).
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> Sent: Wednesday, March 11, 2015 8:07 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: Filtering question
> >>
> >> Hi,
> >>
> >> BooleanQuery:
> >> -- Clause 1: TermQuery
> >> -- Clause 2: FilteredQuery
> >> ----- Branch 1: MatchAllDocsQuery()
> >> ----- Branch 2: MyNDVFilter
> >>
> >>
> >> Why does it look like this? Clause 2 should simply be:
> >> ConstantScoreQuery(MyNDVFilter) In that case the BooleanQuery will
> >> execute more effectively, in case of 2 MUST clauses it will leap-frog.
> >>
> >> The reason for this behavior is the way how FilteredQuery executes: A
> >> filter is seen as cheap, so it is applied down low. If it supports
> >> Bits()
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message