lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <a...@flax.co.uk>
Subject Re: Opposite of SpanFirstQuery - Searching for documents by last term in a field
Date Wed, 14 Dec 2016 09:26:56 GMT
I’ve done this before by appending a special token to text fields via a TokenFilter.  It
hasn’t caused a noticeable problem with term stats, and field:* still works because the
token is only added if the document in question actually has data in that particular field.

Alan Woodward
www.flax.co.uk


> On 14 Dec 2016, at 05:02, Trejkaz <trejkaz@trypticon.org> wrote:
> 
> On Wed, Dec 12, 2012 at 3:04 AM, Ian Lea <ian.lea@gmail.com> wrote:
>> The javadoc for SpanFirstQuery says it is a special case of
>> SpanPositionRangeQuery so maybe you can use the latter directly,
>> although you might need to know the position of the last term which
>> might be a problem.
>> 
>> Alternatives might include reversing the terms and using SpanFirst or
>> adding a special "thisistheend" token to each field and using
>> SpanNearQuery for dog and thisistheend with suitable value for slop
>> and inOrder = true.
>> 
>> Or take the last term and index it in a separate field so you can just
>> search for lastterm: dog.
> 
> Idly wondering whether anyone has figured out a good way yet in the
> time elapsed since last asked.
> 
> Here's my problems with the existing ideas:
> 
> 1. (Using SpanPositionRangeQuery) I am not really sure how to get the
> position of the last term.
> 
> 2. (Using a special token) Adding a token to every document skews term
> statistics and requires manually filtering it out of term listings.
> Additionally it ruins certain wildcard queries like field:* since now
> every field will match.
> 
> 3. (Indexing the last term(s) in a separate field) In our case we
> don't know how far from the end of the content the user will enter
> into the query. They might write:
> 
>  term w/10 end-of-content
>  term w/1000 end-of-content
>  ...
> 
> Other ideas:
> 
> 4. Storing all the content twice initially seems to be a potential
> solution, but starts looking very hard once you combine queries. For
> instance, what about this:
> 
>  (term w/10 start-of-content) w/30 (another-term w/10 end-of-content)
> 
> 5. Put a payload the last term and then _somehow_ (I have no idea how
> payload queries work yet) use payload queries to do spans from that.
> 
> 
> Is there any good solution to this that people have already figured
> out? Is there another SpanPositionCheckQuery subclass that could be
> written which somehow fetches the last position in the document from
> the acceptPosition method?
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message