lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Distributed Indexing
Date Wed, 09 Feb 2011 21:03:43 GMT
I haven't had time to follow all of this discussion, but this issue might help:
https://issues.apache.org/jira/browse/SOLR-2355

It's an implementation of the basic
http://localhost:8983/solr/update/csv?shards=shard1,shard2...

-Yonik
http://lucidimagination.com

On Mon, Feb 7, 2011 at 8:55 AM, Upayavira <uv@odoko.co.uk> wrote:
> Surely you want to be implementing an UpdateRequestProcessor, rather than a
> RequestHandler.
>
> The ContentStreamHandlerBase, in the handleRequestBody method gets an
> UpdateRequestProcessor and uses it to process the request. What we need is
> that handleRequestBody method to, as you have suggested, check on the shards
> parameter, and if necessary call a different UpdateRequestProcessor (a
> DistributedUpdateRequestProcessor).
>
> I don't think we really need it to be configurable at this point. The
> ContentStreamHandlerBase could just use a single hardwired implementation.
> If folks want choice of DistributedUpdateRequestProcessor, it can be added
> later.
>
> For configuration, the DistributedUpdateRequestProcessor should get its
> config from the parent RequestHandler. The configuration I'm most interested
> in is the DistributionPolicy. And that can be done with a
> distributionPolicyClass=solr.IDHashDistributionPolicy request parameter,
> which could potentially be configured in solrconfig.xml as an invariant, or
> provided in the request by the user if necessary.
>
> So, I'd avoid another "thing" that needs to be configured unless there are
> real benefits to it (which there don't seem to me to be right now).
>
> Upayavira
>
> On Sun, 06 Feb 2011 23:08 +0000, "Alex Cowell" <alxcwll@gmail.com> wrote:
>
> Hey,
>
> We're making good progress, but our DistributedUpdateRequestHandler is
> having a bit of an identity crisis, so we thought we'd ask what other
> people's opinions are. The current situation is as follows:
>
> We've added a method to ContentStreamHandlerBase to check if an update
> request is distributed or not (based on the presence/validity of the
> 'shards' parameter). So a non-distributed request will proceed as normal but
> a distributed request would be passed on to the
> DistributedUpdateRequestHandler to deal with.
>
> The reason this choice is made in the ContentStreamHandlerBase is so that
> the DistributedUpdateRequestHandler can use the URL the request came in on
> to determine where to distribute update requests. Eg. an update request is
> sent to:
> http://localhost:8983/solr/update/csv?shards=shard1,shard2...
> then the DistributedUpdateRequestHandler knows to send requests to:
> shard1/update/csv
> shard2/update/csv
>
> Alternatively, if the request wasn't distributed, it would simply be handled
> by whichever request handler "/update/csv" uses.
>
> Herein lies the problem. The DistributedUpdateRequestHandler is not really a
> request handler in the same way as the CSVRequestHandler or
> XmlUpdateRequestHandlers are. If anything, it's more like a "plugin" for the
> various existing update request handlers, to allow them to deal with
> distributed requests - a "distributor" if you will. It isn't designed to be
> able to receive and handle requests directly.
>
> We would like this "DistributedUpdateRequestHandler" to be defined in the
> solrconfig to allow flexibility for setting up multiple different
> DistributedUpdateRequestHandlers with different ShardDistributionPolicies
> etc.and also to allow us to get the appropriate instance from the core in
> the code. There seem to be two paths for doing this:
>
> 1. Leave it as an implementation of SolrRequestHandler and hope the user
> doesn't directly send update requests to it (ie. a request to
> http://localhost:8983/solr/<distrib update handler path> would most likely
> cripple something). So it would be defined in the solrconfig something like:
> <requestHandler name="distrib-update"
> class="solr.DistributedUpdateRequestHandler" />
>
> 2. Create a new plugin type for the solrconfig, say
> "updateRequestDistributor" which would involve creating a new interface for
> the DistributedUpdateRequestHandler to implement, then registering it with
> the core. It would be defined in the solrconfig something like:
> <updateRequestDistributor name="distrib-update"
> class="solr.DistributedUpdateRequestHandler">
>   <lst name="defaults">
>     <str name="policy">solr.HashedDistributionPolicy</str>
>   </lst>
> </updateRequestDistributor>
>
> This would mean that it couldn't directly receive requests, but that an
> instance could still easily be retrieved from the core to handle the
> distribution of update requests.
>
> Any thoughts on the above issue (or a more succinct, descriptive name for
> the class) are most welcome!
>
> Alex
>
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message