lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2205) Grouping performance improvements
Date Thu, 28 Oct 2010 18:16:21 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Martijn van Groningen updated SOLR-2205:
----------------------------------------

    Attachment: SOLR-2205.patch

The code I initially wrote was on the pre-flex code base. So I took that code and made it
work for the trunk. So someone should definitely check it out if all the changes I made are
the right changes. 

I tested this patch out on my local machine and when doing a search (q=*:*) on an index that
holds 10M documents, the searchtime was around 300 ms whereas the same query without the code
changes had a searchtime of around 2.8 seconds.  So that is +/- 9 times faster. These numbers
are based on a basic search, so no facets or highlighting etc.

I found out that the following piece of code took relatively a lot time to execute (if it
was executed millions and millions of times, you started to notice):
{code}
filler.fillValue(doc);
groupMap.get(mval);
{code} 

This fragment is used in the TopGroupCollector and Phase2GroupCollector. I put some code in
front of it the easily exclude documents that are not competitive.  This code in both classes
is cheaper then using the fragment above.

Since I ported the code from pre-flex code I needed to make some changes to it and support
 grouping by function. The code I initially wrote only needed to support grouping on a field.
Since the trunk also supports grouping by function query, I added two methods to DocValues
and implemented these methods in three subclasses. I don't know if this particular change
is good, but it works. I think that it would be really helpful is someone can give feedback
on this particular change.

> Grouping performance improvements
> ---------------------------------
>
>                 Key: SOLR-2205
>                 URL: https://issues.apache.org/jira/browse/SOLR-2205
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2205.patch
>
>
> This issue is dedicated to the performance of the grouping functionality.
> I've noticed that the code is not really performing on large indexes. Doing a search
(q=*:*) with grouping on an index from around 5M documents took around one second on my local
development machine. We had to support grouping on an index that holds around 50M documents
per machine, so we made some changes and were able to happily serve that amount of documents.
Patch will follow soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message