lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Norm Value of not existing Field
Date Fri, 04 Dec 2009 13:54:11 GMT
The word "Filter" as part of a class is overloaded in Lucene <G>....


The above filter is just a DocIdSet, one bit per document. So
in your example, you're only talking 12M or so, even if you
create one filter for every field and keep it around.

You *might* get some joy from, say, QueryWrapperFilter, although
I don't know if it handles pure wildcard terms (e.g. field:*)...

If that doesn't work out of the box, I *think* you can use TermDocs
with a term like field:"" and just keep marching until next() returns
false, merrily setting your Filter bits for each Doc returned by
the enumerator.....


On Fri, Dec 4, 2009 at 3:40 AM, Benjamin Heilbrunn <> wrote:

> Erick, I'm not sure if I understand you right.
> What do you mean by "spinning through all the terms on a field".
> It would be an option to load all unique terms of a field by using
> TermEnum.
> Than use TermDocs to get the docs to those terms.
> The rest of docs doesn't contain a term and so you know, that the
> field don't exists or is empty on those docs.
> Btw: Is there a distinction in Lucene between empty and not existing
> Fields?
> The above method would work very well I think, but it would require to
> build and hold an extra data structure.
> My index has about 20 fields and 4 million docs. The overhead would be to
> large.
> I think - using the norms array (which is already there for most of
> the fields) would be a nice approach.
> Benjamin
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message