lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riccardo Tasso <riccardo.ta...@gmail.com>
Subject Re: Count terms for IntPoint field
Date Thu, 01 Mar 2018 22:50:30 GMT
Ok, I've studied the documentation.

First of all what I needed for most of my fields (StringField, TextField)
is the:

MultiFields.getTerms(reader, field.name).size();

which counts the distinct terms for the field.

For PointFields the hint was right: PointValues.size() is what i need.

For DocValues my question doesn't make sense, since there is no inverted
index for those fields.

Another edge case is the stored only field. Also for this one I think no
count could be provided by lucene.

Riccardo

2018-03-01 19:52 GMT+01:00 Riccardo Tasso <riccardo.tasso@gmail.com>:

> Thanks, probably for DocValues I can use DocValuesStatsCollector
> and DocValuesStats.
>
> 2018-03-01 2:13 GMT+01:00 Adrien Grand <jpountz@gmail.com>:
>
>> You probably want to look at PointValues.size(), which gives you the
>> number
>> of indexed points. Doc values do not support index statistics however.
>>
>> Le mer. 28 févr. 2018 à 21:47, Riccardo Tasso <riccardo.tasso@gmail.com>
>> a
>> écrit :
>>
>> > Hello,
>> >  I'm porting an application from lucene 4 to lucene 7.
>> >
>> > I've converted a field from IntField to IntPoint and at query or
>> indexing
>> > time everything is ok.
>> >
>> > When I call the method:
>> >
>> > reader.getSumTotalTermFreq(field);
>> >
>> > it returns zero for my IntPoint field. I understand that IntPoint is
>> stored
>> > in specific data structure (the block k-d tree), but how could I obtain
>> the
>> > same result as in the previous version?
>> >
>> > Which is the best way to count the "number of terms" also for IntPoint?
>> >
>> > Can I also find the equivalent of "top terms", i.e. the list of more
>> > frequent values for a given field with their count?
>> >
>> > It would be the same if I will use the NumericDocValuesField?
>> >
>> > Thanks,
>> >  Riccardo
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message