lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (JIRA)" <>
Subject [jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0
Date Sat, 04 Jun 2011 10:43:47 GMT


Martijn van Groningen updated SOLR-2564:

    Attachment: SOLR-2564.patch

Hi Yonik,

It is good to know that you took a look at the patch!

bq. in the QueryComponent, why the change to set the GET_SCORES flag based on the sort(s)?
Yes I did this because I used to set Grouping.needScores with this flag. The needScores I
also used whether to indicate if the scores need to be cached. However I have changed this
in the updated patch and basically this check isn't done with setting GET_SCORES flag. 

bq. I'm not a fan of this new style for matching request parameters to enums...
We can choose to leave out the upper-casing. Solr users would then need make sure that parameter
options are spelled correctly. Would that be allright? 

bq. "Accuracy" seems a bit mis-named?
Maybe another name is more descriptive. Maybe style or method?

bq. The parameter "group.totalCount" I would expect to return the total count of something,
not control the pre/post faceting thing?
The jdoc is mixed up with group.docSet. I also think that group.groupCount is a better name.
I changed this in the new patch 

bq. What does "group.docSet" do?
Currently nothing. I plan to use it when I finish LUCENE-3097. Basically it will decide whether
the docset (for FacetComponent and StatsComponent) is based on plain documents or groups.
Since you can have more than one Command (Field / Function / Query), it will then select the
first CommandField or CommandFunction. I'm not sure how we should handle multiple command
when having more than one command. 

bq. I'm not sure we should default group.cache to true
The query time can really be reduced with this option, but yes it requires more memory. If
the cache collector threshold is met they array is immediately set to null during the search,
so gc might be able to clean it up during the search. Also Solr users get a message in the
response. Somehow I forget to move that from SOLR-2524, but it is in the updated patch now.

bq. we could dump group.cache and have a single group.cacheMB parameter that uses 0 as no
cache, -1 as maximum needed (solr uses -1 in this manner in other places too)
Makes sense, grouping then at least consistent with the rest of Solr. I made it default to
-1 for now.

bq. FYI: there's a nocommit in there misspelled as "No commit"
I have removed that.

It wasn't necessary before, and there are advantages to preserving information (like the fact
that someone said "no limit" vs a specific number) until as late as possible. That was previously
handled by getMax() in, and I still see it being called... so it should be OK?
I've removed this if statement and made sure that getMax(...) is used wherever it is needed.

> Integrating grouping module into Solr 4.0
> -----------------------------------------
>                 Key: SOLR-2564
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>             Fix For: 4.0
>         Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
> Since work on grouping module is going well. I think it is time to wire this up in Solr.
> Besides the current grouping features Solr provides, Solr will then also support second
pass caching and total count based on groups.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message