lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-4465) Configurable Collectors
Date Mon, 18 Mar 2013 22:23:16 GMT

     [ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-4465:
---------------------------------

    Description: 
This ticket provides a patch to add pluggable collectors to Solr. This patch was generated
and tested with Solr 4.1.

This is how the patch functions:

Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element.
For example:

<collectorFactory name="default" class="solr.CollectorFactory"/>
<collectorFactory name="sum" class="solr.SumCollectorFactory"/>

The elements above define two collector factories. The first one is the "default" collectorFactory.
The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements
logic that returns the default TopScoreDocCollector and TopFieldCollector. 

To create your own collectorFactory you must subclass the default CollectorFactory and at
a minimum override the getCollector method to return your new collector. 

You can tell Solr which collectorFactory to use at query time using http parameters. All collector
parameters start with the prefix "cl.". All parameters that start with "cl." are gathered
up and added to a CollectorSpec instance which is passed to CollectorFactories.

The parameter "cl" turns on pluggable collectors:

cl=true

If cl is not in the parameters, Solr will automatically use the default collectorFactory.


*Pluggable doclist Sorting with Topdocs Collectors*


You can specify two types of pluggable collectors. The first type is the topdocs collector.
For example:

cl.topdocs=<name>

The above param points to the named collectorFactory in the solrconfig.xml to construct the
collector. Topdocs collectorFactorys must return collectors that extend the TopDocsCollector
base class. Topdocs collectors are responsible for collecting the doclist.

You can pass parameters to the topdocs collectors by adding "cl." http parameters. By convention
you can pass parameters to the topdocs collector like this:

cl.topdocs.max=100

This parameter will be added to the collector spec because of the "cl." prefix and passed
to the collectorFactory.

*Pluggable Custom Analytics With Delegating Collectors*

You can also specify any number of delegating collectors with the "cl.delegating" parameter.
Delegating collectors are designed to collect something else besides the doclist. Typically
this would be some type of custom analytic. 

cl.delegating=sum,ave

The parameter above specifies two delegating collectors named sum and ave. Like the topdocs
collectors these point to named collectorFactories in the solrconfig.xml. 

Delegating collector factories must return Collector instances that extend DelegatingCollector.




A sample delegating collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory.

This collectorFactory provides a very simple DelegatingCollector that groups by a field and
sums a column of floats. The sum collector is not designed to be a fully functional sum function
but to be a proof of concept for pluggable analytics through delegating collectors.

To communicate with delegating collectors you need to reference the name and ordinal of the
collector.
The ordinal refers to the collectors ordinal in the comma separated list.

For example:

cl.delegating=sum,ave&cl.sum.0.groupby=field1

The "cl.sum.0.groupy" parameter tells the "sum" collector at the 0 ordinal to group by "field1".

Delegating collectors are passed a reference to the ResponseBuilder and can place maps with
analytic output directory into the SolrQueryResponse with the add() method.

Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response.


*Distributed Search*

The CollectorFactory also has a method called merge(). This method aggregates the results
from each of the shards during distributed search. The "default" CollectoryFactory implements
the default merge logic for merging documents from each shard. If you define a different topdocs
collector you may need to change the default merge method to merge documents in accordance
with how they are being collected at the shard level.

With delegating collectors, you'll need to overide the merge method to merge the analytic
outputs from the shards. An example of how this works is provide in the SumCollectorFactory.

*Testing the Patch With Sample Data*

1) Apply patch to Solr 4.1
2) Load sample data
3) Send the http command:

http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true&cl=true&cl.topdocs=default&cl.delegating=sum&cl.sum.0.groupby=manu_id_s&cl.sum.0.column=price

The doclist will be generated by the "default" topdocs collector and the output will include
a map named "cl.sum.0" which will have output from the delegating sum collector.











  









  was:
This ticket provides a patch to add pluggable collectors to Solr. This patch was generated
and tested with Solr 4.1.

This is how the patch functions:

Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element.
For example:

<collectorFactory name="default" class="solr.CollectorFactory"/>
<collectorFactory name="sum" class="solr.SumCollectorFactory"/>

The elements above define two collector factories. The first one is the "default" collectorFactory.
The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements
logic that returns the default TopScoreDocCollector and TopFieldCollector. 

To create your own collectorFactory you must subclass the default CollectorFactory and at
a minimum override the getCollector method to return your new collector. 

You tell Solr which collectorFactory to use at query time using http parameters. All collector
parameters start with the prefix "cl.". All parameters that start with "cl." are gathered
up and added to a CollectorSpec instance.

The parameter "cl" turns on pluggable collectors:

cl=true

If cl is not in the parameters, Solr will automatically use the default collectorFactory.


*Pluggable doclist Sorting with Topdocs Collectors*


You can specify two types of pluggable collectors. The first type is the topdocs collector.
For example:

cl.topdocs=<name>

The above param points to the named collectorFactory in the solrconfig.xml to construct the
collector. Topdocs collectorFactorys must return collectors that extend the TopDocsCollector
base class. Topdocs collectors are responsible for collecting the doclist.

You can pass parameters to the topdocs collectors by adding "cl." http parameters. By convention
you can pass parameters to the topdocs collector like this:

cl.topdocs.max=100

This parameter will be added to the collector spec because of the "cl." prefix and passed
to the collectorFactory.

*Pluggable Custom Analytics With Delegating Collectors*

You can also specify any number of delegating collectors with the "cl.delegating" parameter.
Delegating collectors are designed to collect something else besides the doclist. Typically
this would be some type of custom analytic. 

cl.delegating=sum,ave

The parameter above specifies two delegating collectors named sum and ave. Like the topdocs
collectors these point to named collectorFactories in the solrconfig.xml. 

Delegating collector factories must return Collector instances that extend DelegatingCollector.




A sample delegating collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory.

This collectorFactory provides a very simple DelegatingCollector that groups by a field and
sums a column of floats. The sum collector is not designed to be a fully functional sum function
but to be a proof of concept for pluggable analytics through delegating collectors.

To communicate with delegating collectors you need to reference the name and ordinal of the
collector.
The ordinal refers to the collectors ordinal in the comma separated list.

For example:

cl.delegating=sum,ave&cl.sum.0.groupby=field1

The "cl.sum.0.groupy" parameter tells the "sum" collector at the 0 ordinal to group by "field1".

Delegating collectors are passed a reference to the ResponseBuilder and can place maps with
analytic output directory into the SolrQueryResponse with the add() method.

Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response.


*Distributed Search*

The CollectorFactory also has a method called merge(). This method aggregates the results
from each of the shards during distributed search. The "default" CollectoryFactory implements
the default merge logic for merging documents from each shard. If you define a different topdocs
collector you may need to change the default merge method to merge documents in accordance
with how they are being collected at the shard level.

With delegating collectors, you'll need to overide the merge method to merge the analytic
outputs from the shards. An example of how this works is provide in the SumCollectorFactory.

*Testing the Patch With Sample Data*

1) Apply patch to Solr 4.1
2) Load sample data
3) Send the http command:

http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true&cl=true&cl.topdocs=default&cl.delegating=sum&cl.sum.0.groupby=manu_id_s&cl.sum.0.column=price

The doclist will be generated by the "default" topdocs collector and the output will include
a map named "cl.sum.0" which will have output from the delegating sum collector.











  









    
> Configurable Collectors
> -----------------------
>
>                 Key: SOLR-4465
>                 URL: https://issues.apache.org/jira/browse/SOLR-4465
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 4.1
>            Reporter: Joel Bernstein
>             Fix For: 4.3
>
>         Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch,
SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch
>
>
> This ticket provides a patch to add pluggable collectors to Solr. This patch was generated
and tested with Solr 4.1.
> This is how the patch functions:
> Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory
element. For example:
> <collectorFactory name="default" class="solr.CollectorFactory"/>
> <collectorFactory name="sum" class="solr.SumCollectorFactory"/>
> The elements above define two collector factories. The first one is the "default" collectorFactory.
The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements
logic that returns the default TopScoreDocCollector and TopFieldCollector. 
> To create your own collectorFactory you must subclass the default CollectorFactory and
at a minimum override the getCollector method to return your new collector. 
> You can tell Solr which collectorFactory to use at query time using http parameters.
All collector parameters start with the prefix "cl.". All parameters that start with "cl."
are gathered up and added to a CollectorSpec instance which is passed to CollectorFactories.
> The parameter "cl" turns on pluggable collectors:
> cl=true
> If cl is not in the parameters, Solr will automatically use the default collectorFactory.
> *Pluggable doclist Sorting with Topdocs Collectors*
> You can specify two types of pluggable collectors. The first type is the topdocs collector.
For example:
> cl.topdocs=<name>
> The above param points to the named collectorFactory in the solrconfig.xml to construct
the collector. Topdocs collectorFactorys must return collectors that extend the TopDocsCollector
base class. Topdocs collectors are responsible for collecting the doclist.
> You can pass parameters to the topdocs collectors by adding "cl." http parameters. By
convention you can pass parameters to the topdocs collector like this:
> cl.topdocs.max=100
> This parameter will be added to the collector spec because of the "cl." prefix and passed
to the collectorFactory.
> *Pluggable Custom Analytics With Delegating Collectors*
> You can also specify any number of delegating collectors with the "cl.delegating" parameter.
Delegating collectors are designed to collect something else besides the doclist. Typically
this would be some type of custom analytic. 
> cl.delegating=sum,ave
> The parameter above specifies two delegating collectors named sum and ave. Like the topdocs
collectors these point to named collectorFactories in the solrconfig.xml. 
> Delegating collector factories must return Collector instances that extend DelegatingCollector.

> A sample delegating collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory.
> This collectorFactory provides a very simple DelegatingCollector that groups by a field
and sums a column of floats. The sum collector is not designed to be a fully functional sum
function but to be a proof of concept for pluggable analytics through delegating collectors.
> To communicate with delegating collectors you need to reference the name and ordinal
of the collector.
> The ordinal refers to the collectors ordinal in the comma separated list.
> For example:
> cl.delegating=sum,ave&cl.sum.0.groupby=field1
> The "cl.sum.0.groupy" parameter tells the "sum" collector at the 0 ordinal to group by
"field1".
> Delegating collectors are passed a reference to the ResponseBuilder and can place maps
with analytic output directory into the SolrQueryResponse with the add() method.
> Maps that are placed in the SolrQueryResponse are automatically added to the outgoing
response.
> *Distributed Search*
> The CollectorFactory also has a method called merge(). This method aggregates the results
from each of the shards during distributed search. The "default" CollectoryFactory implements
the default merge logic for merging documents from each shard. If you define a different topdocs
collector you may need to change the default merge method to merge documents in accordance
with how they are being collected at the shard level.
> With delegating collectors, you'll need to overide the merge method to merge the analytic
outputs from the shards. An example of how this works is provide in the SumCollectorFactory.
> *Testing the Patch With Sample Data*
> 1) Apply patch to Solr 4.1
> 2) Load sample data
> 3) Send the http command:
> http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml&indent=true&cl=true&cl.topdocs=default&cl.delegating=sum&cl.sum.0.groupby=manu_id_s&cl.sum.0.column=price
> The doclist will be generated by the "default" topdocs collector and the output will
include a map named "cl.sum.0" which will have output from the delegating sum collector.
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message