lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: TermInSetQuery keep terms order in results
Date Mon, 02 Jul 2018 11:10:05 GMT
Hi Michael,

I have an index that contains the terms of the TermInSetQuery but the
score provided at query time, represented by the order in a List of
terms, is not known at indexing time; it depend from other calculations
done at runtime. What do you mean to index the ordinals?

I was wondering if I can wrap TermQuery in BoostQuery, where I boost
based on the ordinals I have and create a disjunction query of all the
terms; I was wondering how much slower than TermInSetQuery it can be.


Nicola



On Mon, 2018-07-02 at 06:41 -0400, Michael Sokolov wrote:
> Since you have the terms ordered, why not index their ordinals, and
> then sort by that?
> 
> On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <nbuso@ebi.ac.uk> wrote:
> > Hi Uwe,
> > 
> > as said the sorting is calculated elsewhere upfront and the terms
> > are
> > provided to Lucene in the order calculated (in any case in an not
> > ordered Set as by the query API).
> > 
> > I would like an API to keep the input order otherwise I will end up
> > on
> > the usual problem that I can't re-order afterward because accessing
> > the
> > results in a paginated way will make impossible this operation.
> > 
> > 
> > Nicola
> > 
> > On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> > > Hi Nicola,
> > > 
> > > if you sort it elsewhere, why do you care about sort order then?
> > What
> > > you see as result is simple: As there is nothing available for
> > > scoring a constant score query returns the results in index
> > order.
> > > That's wanted. There is no way to change this "default" order for
> > a
> > > TermInSetQuery because it's missing information.
> > > 
> > > Uwe
> > > 
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > > 
> > > > -----Original Message-----
> > > > From: Nicola Buso <nbuso@ebi.ac.uk>
> > > > Sent: Monday, June 25, 2018 5:09 PM
> > > > To: Uwe Schindler <uwe@thetaphi.de>; java-user@lucene.apache.or
> > g
> > > > Subject: Re: TermInSetQuery keep terms order in results
> > > > 
> > > > Hi Uwe,
> > > > 
> > > > thanks for the reply. TermInSetQuery cover most of my use case:
> > > > - thousands of term values (also 100,000)
> > > > - no need for scoring, because it's calculated elsewhere
> > > > - intersect with normal full text query for further filtering
> > > > 
> > > > Using a TermQuery do I risk to hit the
> > > > BooleanQuery.getMaxClauseCount()
> > > > limit?
> > > > 
> > > > Cheers,
> > > > 
> > > > 
> > > > Nicola
> > > > 
> > > > 
> > > > 
> > > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > > > Hi,
> > > > > 
> > > > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > > > more
> > > > > meant as a filter, so you would need some "real" fulltext
> > query
> > > > > in
> > > > > parallel. See the term-in-set query more like the SQL "IN"
> > > > > operator.
> > > > > It can be used to pass lots of identifiers to filter results
> > > > > (e.g.
> > > > > when you apply access rights or group policies for filtering
> > > > > users to
> > > > > your main query as a filter).
> > > > > 
> > > > > As it is a "set", which is by default unordered, the order of
> > > > > terms
> > > > > in the set is undefined. Internally TermInSetQuery reorders
> > the
> > > > > terms
> > > > > to improve processing speed.
> > > > > 
> > > > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > > > Then
> > > > > you can apply some boosts to some terms to improve order
> > (e.g.
> > > > > boost
> > > > > term queries coming first) and apply on a field without
> > norms.
> > > > > 
> > > > > TermInSetQuery is fast because it neglects scoring and is
> > just
> > > > > good
> > > > > at intersecting the terms dict with the given terms set.
> > > > > 
> > > > > Uwe
> > > > > 
> > > > > -----
> > > > > Uwe Schindler
> > > > > Achterdiek 19, D-28357 Bremen
> > > > > http://www.thetaphi.de
> > > > > eMail: uwe@thetaphi.de
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Nicola Buso <nbuso@ebi.ac.uk>
> > > > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: TermInSetQuery keep terms order in results
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I need to use the TermInSetQuery, but I would like to keep
> > the
> > > > > > sorting
> > > > > > of the results based on the term set order provided.
> > Currently
> > > > > > seems
> > > > > > using a index documents insertion order in the results.
> > > > > > 
> > > > > > Is this already implemented somewhere or do I need to
> > implement
> > > > > > a
> > > > > > CustomScoreQuery to calculate this score?
> > > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > > 
> > > > > > Nicola
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Nicola Buso <nbuso@ebi.ac.uk>
> > > > > > EMBL-EBI
> > > > > > 
> > > > > > ---------------------------------------------------------
> > ----
> > > > > > ----
> > > > > > ----
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache
> > .org
> > > > > > For additional commands, e-mail: java-user-help@lucene.apac
> > he.o
> > > > > > rg
> > > > > 
> > > > > 
> > > > 
> > > > -------------------------------------------------------------
> > ----
> > > > ----
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.o
> > rg
> > > 
> > > 
-- 
Nicola Buso <nbuso@ebi.ac.uk>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message