lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.v.gronin...@gmail.com>
Subject Re: Grouping on Long type uses function query?
Date Wed, 30 Nov 2011 07:44:36 GMT
Actually DocTermsIndex entry can take quite some memory. I believe in
the case when you have a lot of unique strings more memory is used for
DocTermsIndex then if you have a small number of unique fieldvalues
with many documents per value.

I do think that an option that decides whether a double cache entry is
added to FC is desirable. The default should be false and if users
want fast grouping for non string fields then they set this option to
true. I think group.method is a bit vague and it isn't descriptive
about what exactly is is doing. It should be an expert option.
Maybe something like group.moreRamFasterGroupingNonStringFields=[true|false]

Having the BlockGroupingCollector in Solr would be great. However the
collector depends on block indexing and this is something that Solr
currently doesn't support. So that needs to be implemented first. I
think for using the BlockGroupingCollector we would just need two
parameters one that tells Solr to actually use the
BlockGroupingCollector and one parameter that tell Solr how to query
for the parent documents. Maybe be something like:
group.block=[true|false] and group.parent.query=[query]

Martijn

On 30 November 2011 00:30, Young, Cody <Cody.Young@move.com> wrote:
> Hi Martijn,
>
> Thanks for the response!
>
> Doesn't it take a lot more memory to hold a string field in the FieldCache than a long
field?
>
> In our grouping scenario, we have many unique values with a small number of documents
per group. I would think that even the double FieldCache memory hit on a long would be less
than using a string.
>
> Would this is a suitable place to have a grouping parameter to control the behavior?
group.method? I'm looking at using the BlockGroupingCollector as well, perhaps "block" could
be another choice?
> The downside being that there are invalid combinations. (You wouldn’t change group.method
to anything else if you were using a function to group)
>
> Thanks,
> Cody
>
> -----Original Message-----
> From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn
v Groningen
> Sent: Tuesday, November 29, 2011 2:09 PM
> To: dev@lucene.apache.org
> Subject: Re: Grouping on Long type uses function query?
>
> If I remember correctly this was done to avoid insane FieldCache usage.
>
> If Term based grouping implementation is used then for that field an entry is created
in the FieldCache of type DocTermsIndex. It might then happen that for other search features
like sorting and faceting a second entry is created in the FieldCache. Sorting for example will
put in your case a new entry for this field in the FieldCache of type long. When the Function
based grouping implementations are used this is not the case. Only one cache entry of type
long is put in the FieldCache and sorting or faceting will reuse these entries.
>
> The downside of the Function based grouping implementations is that they are slower then
the Term based implementation.
> At the time this feature was integrated into Solr the decision was made to not have double
FieldCache usage per field and use the slower Function based implementation for non string
fields.
>
> The work around that doesn't involve coding is the make a copy field of type string,
but then you add more fields / data to your index...
>
> On 29 November 2011 22:25, Young, Cody <Cody.Young@move.com> wrote:
>> Hi All,
>>
>>
>>
>> I’m new to solr development. Since I’m new with the code base, I
>> thought I’d double check here before making a JIRA issue. We’re trying
>> to use grouping on a field with a type of long (on trunk):
>>
>>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> omitNorms="true" positionIncrementGap="0"/>
>>
>>
>>
>> The performance wasn’t what we were looking for so I’m taking a quick
>> look at the grouping code in solr and I noticed that a string field
>> uses the Term grouping classes (CommandField in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java).
>> However, when using a long field the Function grouping classes get
>> used (CommandFunc in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When
>> I change it over to using CommandField instead of CommandFunc for long
>> type I get a decrease in QTime (I only did light testing, and just simple queries
but it seemed to drop by 50% or so).
>>
>>
>>
>> The functionality appears to still work and the grouping tests pass,
>> but as I’m not very familiar with the solr code I wasn’t sure if there
>> was a reason for Long to use CommandFunc instead of CommandField.
>>
>>
>>
>> I’m happy to take a stab at making a JIRA issue and a patch if this is
>> indeed an issue, but I’ll need some guidance on the best way to fix
>> this (perhaps instead of using instanceof StrFieldSource or instanceof
>> LongFieldSource there is a better way to check?).
>>
>>
>>
>> The change I made to test this was very simple, I just added:
>>
>>
>>
>> import org.apache.lucene.queries.function.valuesource.LongFieldSource;
>>
>>
>>
>> and at Line 176 of Grouping.java
>>
>>      } else if(valueSource instanceof LongFieldSource) {
>>
>>          String field = ((LongFieldSource) valueSource).getField();
>>
>>          CommandField commandField = new CommandField();
>>
>>          commandField.groupBy = field;
>>
>>          gc = commandField;
>>
>>
>>
>> Thanks,
>>
>> Cody
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message