lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Iterating TermsEnum for Long field produces zero values at the end
Date Mon, 17 Nov 2014 18:50:47 GMT
Hi,

> It is expected: those are the "prefix" terms, which come after all the full-
> precision numeric terms.
> 
> But I'm not sure why you see 0s ... the bytes should be unique for every term
> you get back from the TermsEnum.

That's easy to explain:

The lower precision terms at the end have more than one doc in the DocsEnum, you always return
only the first (Lucene docid 0, you never list all other entries in DocsEnum). The prefixcoded
term has a shift value> 0 and because bits are stripped from the right, the small long
values will therefore return 0L after decoding.

In general to have such a type of cache, I would not use terms and instead use numeric docvalues.
An alternative is to use FieldCache, which does the right thing automatically. Relying on
the internal implementation of numeric terms is not a good idea.

Uwe

> On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan
> <b.coughlan2@gmail.com> wrote:
> > Hi all,
> >
> > I'm using 4.10.2. I have a Long "id" field. Each document has one "id"
> > value. I am creating a look-up between Lucene's internal document id
> > and my "id" values by enumerating the inverted index:
> >
> >     private long[] cacheDocIds() throws IOException {
> >         long[] ourIds = new long[reader.maxDoc()];
> >
> >         Bits liveDocs = MultiFields.getLiveDocs(reader);
> >         Fields fields = MultiFields.getFields(reader);
> >         Terms terms = fields.terms("id");
> >
> >         TermsEnum iterator = terms.iterator(null);
> >         BytesRef bytesRef = null;
> >         while ((bytesRef = iterator.next()) != null) {
> >             DocsEnum docsEnum = iterator.docs(liveDocs, null,
> > DocsEnum.FLAG_NONE);
> >
> >             int luceneId = docsEnum.nextDoc();
> >             long ourId = NumericUtils.prefixCodedToLong(bytesRef);
> >             System.out.println(luceneId + " " + ourId);
> >             ourIds[luceneId] = ourId;
> >         }
> >
> >         return ourIds;
> >     }
> >
> > With 5 documents (1, 2, 3, 4, 5) I get this output from the above code:
> >
> > 0 1
> > 1 2
> > 2 3
> > 3 4
> > 4 5
> > 0 0
> > 0 0
> > 0 0
> >
> > I don't understand why there are three zeroes at the end.
> >
> > - reader.maxDoc is 5 and no documents have been deleted.
> > - I have tried this with a varying number of documents and there are
> > always three zeroes at the end.
> > - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the
> > same behavior occurs.
> >
> > I can work around this with but I'm just curious if this behavior is
> > expected?
> >
> > Regards,
> > Barry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message