lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@odoko.co.uk>
Subject Re: Distributed Indexing
Date Mon, 07 Feb 2011 13:55:38 GMT
Surely you want to be implementing an UpdateRequestProcessor,
rather than a RequestHandler.

The ContentStreamHandlerBase, in the handleRequestBody method
gets an UpdateRequestProcessor and uses it to process the
request. What we need is that handleRequestBody method to, as you
have suggested, check on the shards parameter, and if necessary
call a different UpdateRequestProcessor (a
DistributedUpdateRequestProcessor).

I don't think we really need it to be configurable at this point.
The ContentStreamHandlerBase could just use a single hardwired
implementation. If folks want choice of
DistributedUpdateRequestProcessor, it can be added later.

For configuration, the DistributedUpdateRequestProcessor should
get its config from the parent RequestHandler. The configuration
I'm most interested in is the DistributionPolicy. And that can be
done with a distributionPolicyClass=solr.IDHashDistributionPolicy
request parameter, which could potentially be configured in
solrconfig.xml as an invariant, or provided in the request by the
user if necessary.

So, I'd avoid another "thing" that needs to be configured unless
there are real benefits to it (which there don't seem to me to be
right now).

Upayavira

On Sun, 06 Feb 2011 23:08 +0000, "Alex Cowell"
<alxcwll@gmail.com> wrote:

  Hey,
  We're making good progress, but our
  DistributedUpdateRequestHandler is having a bit of an identity
  crisis, so we thought we'd ask what other people's opinions
  are. The current situation is as follows:
  We've added a method to ContentStreamHandlerBase to check if
  an update request is distributed or not (based on the
  presence/validity of the 'shards' parameter). So a
  non-distributed request will proceed as normal but a
  distributed request would be passed on to the
  DistributedUpdateRequestHandler to deal with.
  The reason this choice is made in the ContentStreamHandlerBase
  is so that the DistributedUpdateRequestHandler can use the URL
  the request came in on to determine where to distribute update
  requests. Eg. an update request is sent to:
  [1]http://localhost:8983/solr/update/csv?shards=shard1,shard2.
  ..
  then the DistributedUpdateRequestHandler knows to send
  requests to:
  shard1/update/csv
  shard2/update/csv
  Alternatively, if the request wasn't distributed, it would
  simply be handled by whichever request handler "/update/csv"
  uses.
  Herein lies the problem. The DistributedUpdateRequestHandler
  is not really a request handler in the same way as the
  CSVRequestHandler or XmlUpdateRequestHandlers are. If
  anything, it's more like a "plugin" for the various existing
  update request handlers, to allow them to deal with
  distributed requests - a "distributor" if you will. It isn't
  designed to be able to receive and handle requests directly.
  We would like this "DistributedUpdateRequestHandler" to be
  defined in the solrconfig to allow flexibility for setting up
  multiple different DistributedUpdateRequestHandlers with
  different ShardDistributionPolicies etc.and also to allow us
  to get the appropriate instance from the core in the code.
  There seem to be two paths for doing this:
  1. Leave it as an implementation of SolrRequestHandler and
  hope the user doesn't directly send update requests to it (ie.
  a request to [2]http://localhost:8983/solr/<distrib update
  handler path> would most likely cripple something). So it
  would be defined in the solrconfig something like:
  <requestHandler name="distrib-update"
  class="solr.DistributedUpdateRequestHandler" />
  2. Create a new plugin type for the solrconfig, say
  "updateRequestDistributor" which would involve creating a new
  interface for the DistributedUpdateRequestHandler to
  implement, then registering it with the core. It would be
  defined in the solrconfig something like:
  <updateRequestDistributor name="distrib-update"
  class="solr.DistributedUpdateRequestHandler">
    <lst name="defaults">
      <str name="policy">solr.HashedDistributionPolicy</str>
    </lst>
  </updateRequestDistributor>
  This would mean that it couldn't directly receive requests,
  but that an instance could still easily be retrieved from the
  core to handle the distribution of update requests.
  Any thoughts on the above issue (or a more succinct,
  descriptive name for the class) are most welcome!
  Alex

References

1. http://localhost:8983/solr/update/csv?shards=shard1,shard2.
2. http://localhost:8983/solr/
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


Mime
View raw message