lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@gmail.com>
Subject Re: Overall doc-count in TermStats, during flush...
Date Wed, 20 Mar 2013 13:03:31 GMT
The BitSet basically counts how many documents have one or more values
in this field. Some docs might not have values in this field.
state.segmentInfo.getDocCount() is the # of docs in this segment but
we are flushing a single field here.  We pass down the cardinality
here since
we keep the statistics of the doc count per field in the index since
4.0 so we can't use the segmetns doc count.

hope that helps

simon

On Wed, Mar 20, 2013 at 1:12 PM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> This is an internal code I came across in lucene today and unable to
> decipher it.
>
> FreqProxTermsWriterPerField.java
>
> void flush(String fieldName, FieldsConsumer consumer,  final
> SegmentWriteState state)
> {
> .............
> FixedBitSet visitedDocs = new FixedBitSet(state.segmentInfo.getDocCount());
>   for (int i = 0; i < numTerms; i++)
>   {
>     .............
>     visitedDocs.set(docID);
>     .........
>     termsConsumer.finishTerm(text, new TermStats(docFreq, writeTermFreq ?
> totTF : -1)); *//We plan to pass the state.segmentInfo.getDocCount() in
> TermStats, above. Is it      *
> *    wrong to do this here?*
>   }
> //Once all terms are over
> termsConsumer.finish(writeTermFreq ? sumTotalTermFreq : -1, sumDocFreq,
> visitedDocs.cardinality()); *//Why are we doing cardinality() instead of
> getDocCount() here?*
> *//Can there be un-visited docs during a flush?*
> }
> *
> *
> Can someone help me understand this?
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message