lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: How to fetch documents for which field is not defined
Date Sun, 16 Jul 2017 08:19:23 GMT
On Sat, Jul 15, 2017 at 8:12 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> That is the "Solr" answer. But it is slow like hell.
>
> In Lucene there is a natove query named FieldValueQuery already for this.
> It requires DocValues enabled for the field.
>
> IMHO, the best and fastest variant (also to Solr users) is to add a separate
> multivalued string field named 'fieldnames' where you index all field named
> that have a value. After that you can query on this using the field name.
> Elasticsearch is doing the field name approach for exists/not exists by default.

The catch is, you usually have to analyse a field to determine whether
it has a value. Apparently Elasticsearch's field existence query does
not do this, so it considers blank text to be a value, which is not
the same as what the user expected when they did the query.

We *were* using FieldValueQuery, but since moving to Lucene 6 we have
stopped using uninverting reader, so that option doesn't cover all
fields, and fields like "content" aren't really practical to put in
DocValues...

The approach to add a fieldnames field works, but is fiddly at
indexing-time, because now you have to use TokenStream for all fields,
so that you can read one token from each field to test whether there
is one before you add the whole document. I guess it's at least easier
to understand how it works at query-time.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message