lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: TermDistinctValuesCollector
Date Tue, 08 Nov 2016 08:12:43 GMT
Hi Julian,

Just to be sure we are on the same page, the grouping functionality was posted as a pull request
mere hours ago https://github.com/apache/lucenenet/pull/193. This doesn't yet exist in the
master branch or on NuGet. But since most of these types are missing from the master branch
and it sounds like you are compiling fine, you are probably on the right page (just need to
check). If not, you should pull down that branch and compile it.

I can't tell you exactly how the functionality works, but I can point you to the tests. Unfortunately,
the tests are very complex and there are not many of them so I am not sure how helpful they
will be. At least they will give you some idea of what is required for a common grouping scenario.
The test are at: https://github.com/NightOwl888/lucenenet/tree/grouping/src/Lucene.Net.Tests.Grouping.


Perhaps someone else can give you some better insight on how the functionality works - you
might want to try the Lucene (Java) user group if you are having trouble finding detailed
documentation. The API is very similar in .NET.

Thanks,
Shad Storhaug (NightOwl888)

-----Original Message-----
From: Julian Ohrt [mailto:julian.ohrt@aploris.com] 
Sent: Tuesday, November 8, 2016 2:43 PM
To: user@lucenenet.apache.org
Subject: TermDistinctValuesCollector

Hi:

I am trying to use the class TermDistinctValuesCollector from Core namespace Lucene.Net.Search.Grouping.
Studying http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/search/grouping/term/TermDistinctValuesCollector.html
I found a short explanation for the three parameters of the constructor:

groupField - The field to group by
countField - The field to count distinct values for groups - The top N groups, collected during
the first phase search

I am still not sure what the group is (is it the field for which the content is counted?),
why a lucene field is needed for counting, and what the "first phase search" is.
I tried something like this:

Collection<GroupCount> groups = new Collection<GroupCount>(); TermDistinctValuesCollector
collector = new TermDistinctValuesCollector("groupField", "countField", groups); mIndexSearcher.Search(query,
null, collector);

Of course to no avail. Groups is empty. It did give me any result at all.

Is there any documentation I missed? I did not even find any second hit for "TermDistinctValuesCollector"
in the lucenenet repository except for TermDistinctValuesCollector.cs itself. Not even a unit
test.

A short example how to use it would be awesome. But I'd also like to understand how it works
(should work) internally. What are member variables ordSet, groupFieldTermIndex, etc. used
for? Not knowing the internals of lucene I just don't understand the source code.

Thanks a lot!
Julian



Mime
View raw message