lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Empty fields ...
Date Wed, 19 Jul 2006 13:48:04 GMT
Try something like

TermDocs         termDocs = reader.termDocs();
termDocs.seek(new Term("<relevant field name here>", ""));
while (termDocs.next()) {
    bits.set(termDocs.doc());
}

I *think* (and I'm remembering things folks wrote, haven't done this myself)
that the empty string for the Term matches all terms. If not, you might have
to wrap in in an outer loop that loops through all the elements, something
like

        bits = new BitSet(reader.maxDoc());

        TermDocs         termDocs = reader.termDocs();
        FilteredTermEnum fEnum = new FilteredTermEnum(reader, new
Term(field, ""));

        for (Term term = null; (term = fEnum.term()) != null; fEnum.next())
{
            termDocs.seek(new Term(
                    field,
                    term.text()));

            while (termDocs.next()) {
                bits.set(termDocs.doc());
            }
        }



That said, it may be best for you to loop through each document and add that
doc to the relevant filters if it had the fields you're interested in. You'd
only be fetching each document once, so it'd only be one loop. I don't know
enough about relative efficiencies to make a call here, probably depends
upon how many docs you're dealing with. I'd stop at the first solution that
works with acceptable performance unless you expect your corpus to grow
significantly.... And since this is done in off hours, there's not a
pressing reason to go with the very most efficient solution unless it takes
a too long or you expect to have orders of magnitued more documents in your
index eventually.

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message