lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Norm Value of not existing Field
Date Fri, 04 Dec 2009 13:54:11 GMT
The word "Filter" as part of a class is overloaded in Lucene <G>....

See: http://lucene.apache.org/java/2_9_1/api/all/index.html

The above filter is just a DocIdSet, one bit per document. So
in your example, you're only talking 12M or so, even if you
create one filter for every field and keep it around.

You *might* get some joy from, say, QueryWrapperFilter, although
I don't know if it handles pure wildcard terms (e.g. field:*)...

If that doesn't work out of the box, I *think* you can use TermDocs
with a term like field:"" and just keep marching until next() returns
false, merrily setting your Filter bits for each Doc returned by
the enumerator.....

HTH
Erick


On Fri, Dec 4, 2009 at 3:40 AM, Benjamin Heilbrunn <benhei@gmail.com> wrote:

> Erick, I'm not sure if I understand you right.
> What do you mean by "spinning through all the terms on a field".
>
> It would be an option to load all unique terms of a field by using
> TermEnum.
> Than use TermDocs to get the docs to those terms.
> The rest of docs doesn't contain a term and so you know, that the
> field don't exists or is empty on those docs.
> Btw: Is there a distinction in Lucene between empty and not existing
> Fields?
>
> The above method would work very well I think, but it would require to
> build and hold an extra data structure.
> My index has about 20 fields and 4 million docs. The overhead would be to
> large.
>
> I think - using the norms array (which is already there for most of
> the fields) would be a nice approach.
>
>
> Benjamin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message