lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@odoko.co.uk>
Subject Re: Distributed Indexing
Date Mon, 07 Feb 2011 13:38:40 GMT
I'm saying that deterministic policies are a requirement that
*some* people will want. Others might want a random spread. Thus,
I'd have deterministic based on ID and random as the two initial
implementations.

Upayavira
NB. In case folks haven't worked it out already, I have been
tasked to mentor this group of students in this work, and had the
fortune to be able to point them to a task I've already thought a
lot about myself, but had no time to do :-)

On Sun, 06 Feb 2011 21:57 +0000, "William Mayor"
<mail@williammayor.co.uk> wrote:

  Hi



Good call about the policies being deterministic, should've
thought of that earlier.



We've changed the patch to include this and I've removed the
random assignment one (for obvious reasons).



Take a look and let me know what's to do.
([1]https://issues.apache.org/jira/browse/SOLR-2341)



Cheers



William
On Thu, Feb 3, 2011 at 5:00 PM, Upayavira <[2]uv@odoko.co.uk>
wrote:


On Thu, 03 Feb 2011 15:12 +0000, "Alex Cowell"
<[3]alxcwll@gmail.com> wrote:

  Hi all,
  Just a couple of questions that have arisen.
  1. For handling non-distributed update requests (shards param
  is not present or is invalid), our code currently
  * assumes the user would like the data indexed, so gets the
    request handler assigned to "/update"
  * executes the request using core.execute() for the SolrCore
    associated with the original request

  Is this what we want it to do and is using core.execute() from
  within a request handler a valid method of passing on the
  update request?


Take a look at how it is done in
handler.component.SearchHandler.handleRequestBody(). I'd say try
to follow as similar approach as possible. E.g. it is the
SearchHandler that does much of the work, branching depending on
whether it found a shards parameter.

  2. We have partially implemented an update processor which
  actually generates and sends the split update requests to each
  specified shard (as designated by the policy). As it stands,
  the code shares a lot in common with the HttpCommComponent
  class used for distributed search. Should we look at "opening
  up" the HttpCommComponent class so it could be used by our
  request handler as well or should we continue with our current
  implementation and worry about that later?


I agree that you are going to want to implement an
UpdateRequestProcessor. However, it would seem to me that, unlike
search, you're not going to want to bother with the existing
processor and associated component chain, you're going to want to
replace the processor with a distributed version.

As to the HttpCommComponent, I'd suggest you make your own
educated decision. How similar is the class? Could one serve both
needs effectively?

  3. Our update processor uses a
  MultiThreadedHttpConnectionManager to send parallel updates to
  shards, can anyone give some appropriate values to be used for
  the defaultMaxConnectionsPerHost and maxTotalConnections
  params? Won't the  values used for distributed search be a
  little high for distributed indexing?


You are right, these will likely be lower for distributed
indexing, however I'd suggest not worrying about it for now, as
it is easy to tweak later.

Upayavira

---
Enterprise Search Consultant at Sourcesense UK,
Making Sense of Open Source

References

1. https://issues.apache.org/jira/browse/SOLR-2341
2. mailto:uv@odoko.co.uk
3. mailto:alxcwll@gmail.com
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


Mime
View raw message