lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Lucene custom Query - efficiently and compare retrieve multiple document fields
Date Mon, 12 Feb 2018 18:34:23 GMT
Filtering by one query and scoring by a different query is easy: just put
the filter in a FILTER clause of a BooleanQuery and the scoring query in a
SHOULD clause. Documents that do not match the SHOULD clause will have a
score of zero.

I'm wondering that maybe you are looking for something like this:

Query q = new BooleanQuery.Builder()
  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
Occur.FILTER)
  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
Occur.SHOULD)
  .build();

It's not clear to me why you need to retain order: the order of your values
should not matter?

Le lun. 12 févr. 2018 à 11:23, Dominik Safaric <dominiksafaric@gmail.com> a
écrit :

> In particular, I have a document schema as follows:
>
> {
> "images": [{
> "image_id": 1,
> "features": {
> "coarse_grained": <keyword>,
> "fine_grained": [*<keyword>*]
> }
> }]
> }
>
> In the first run, using a custom Query instance I'd like to hit documents
> by matching the *coarse_grained *field. A document is said to be matching
> if the Hamming distance between the value of a document's
> *coarse_grained* field,
> compared to the one passed through the REST API, is less or equal then a
> set threshold. On the other hand, I'd like to score the hit documents using
> the *fine_grained *field values, which is an array of keywords. A similar
> method using Hamming distance as a similarity measure applies in this case
> as well.
>
> What I'm concerned with is the following: in the second (the scoring) phase
> I'd like to score documents using all fields of the *fine_grained* array of
> keywords. How can I effectively retrieve these values for each document,
> such that their order is equal to the one as they were inserted?
>
> Thanks in advance,
> Dominik
>
> 2018-02-12 8:56 GMT+01:00 Adrien Grand <jpountz@gmail.com>:
>
> > Whether this is doable is going to depend on what you mean by "match[ing]
> > documents according to criteria X". Can you give an example?
> >
> > Le ven. 9 févr. 2018 à 14:47, Dominik Safaric <dominiksafaric@gmail.com>
> a
> > écrit :
> >
> > > Hi,
> > >
> > > I am intending to implement a custom Query using Lucene 6.x and due to
> > the
> > > lack of documentation concerned with a particular topic I have the
> > > following questions.
> > >
> > > The query is expected to implement a two-phase search, in the sense
> that
> > > during the first run it matches documents according to criteria X,
> > whereas
> > > during the later according to criteria Y of another document field. Can
> > > this be accomplished by using the TwoPhaseIterator?
> > >
> > > Secondly, the query as expressed through the API will not specify a
> > > specific query field, but instead of a field that stores an array of
> > > objects. From an implementation point of view, can I using the
> LeafReader
> > > retrieve an object that would map to a Java Map, which I can later use
> > for
> > > accessing a certain field within the object? Of is it perhaps more
> > > advisable to get the document instance using the LeafReader's
> > > getDocument(int docID) function, and then load particular? I'm afraid
> > that
> > > might hurt the performance in overall because the documents would need
> to
> > > be loaded from disk.
> > >
> > > Thanks in advance,
> > > Dominik
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message