lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Cowell <alxc...@gmail.com>
Subject Re: Distributed Indexing
Date Sun, 06 Feb 2011 23:08:21 GMT
Hey,

We're making good progress, but our DistributedUpdateRequestHandler is
having a bit of an identity crisis, so we thought we'd ask what other
people's opinions are. The current situation is as follows:

We've added a method to ContentStreamHandlerBase to check if an update
request is distributed or not (based on the presence/validity of the
'shards' parameter). So a non-distributed request will proceed as normal but
a distributed request would be passed on to the
DistributedUpdateRequestHandler to deal with.

The reason this choice is made in the ContentStreamHandlerBase is so that
the DistributedUpdateRequestHandler can use the URL the request came in on
to determine where to distribute update requests. Eg. an update request is
sent to:
http://localhost:8983/solr/update/csv?shards=shard1,shard2...
then the DistributedUpdateRequestHandler knows to send requests to:
shard1/update/csv
shard2/update/csv

Alternatively, if the request wasn't distributed, it would simply be handled
by whichever request handler "/update/csv" uses.

Herein lies the problem. The DistributedUpdateRequestHandler is not really a
request handler in the same way as the CSVRequestHandler or
XmlUpdateRequestHandlers are. If anything, it's more like a "plugin" for the
various existing update request handlers, to allow them to deal with
distributed requests - a "distributor" if you will. It isn't designed to be
able to receive and handle requests directly.

We would like this "DistributedUpdateRequestHandler" to be defined in the
solrconfig to allow flexibility for setting up multiple different
DistributedUpdateRequestHandlers with different ShardDistributionPolicies
etc.and also to allow us to get the appropriate instance from the core in
the code. There seem to be two paths for doing this:

1. Leave it as an implementation of SolrRequestHandler and hope the user
doesn't directly send update requests to it (ie. a request to
http://localhost:8983/solr/<distrib update handler path> would most likely
cripple something). So it would be defined in the solrconfig something like:
<requestHandler name="distrib-update"
class="solr.DistributedUpdateRequestHandler" />

2. Create a new plugin type for the solrconfig, say
"updateRequestDistributor" which would involve creating a new interface for
the DistributedUpdateRequestHandler to implement, then registering it with
the core. It would be defined in the solrconfig something like:
<updateRequestDistributor name="distrib-update"
class="solr.DistributedUpdateRequestHandler">
  <lst name="defaults">
    <str name="policy">solr.HashedDistributionPolicy</str>
  </lst>
</updateRequestDistributor>

This would mean that it couldn't directly receive requests, but that an
instance could still easily be retrieved from the core to handle the
distribution of update requests.

Any thoughts on the above issue (or a more succinct, descriptive name for
the class) are most welcome!

Alex

Mime
View raw message