lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Cowell <>
Subject Re: Distributed Indexing
Date Mon, 14 Feb 2011 15:04:21 GMT
I've uploaded a patch of what we've done so far:

It's still very much work in progress and there are some obvious issues
which are being resolved at the moment (such as the inefficient method of
waiting for all the docs to be processed before distributing them in one
batch and handling shard replicas), but any feedback is welcomed.

As it stands, you can distribute add and commit requests using the
HashedDistributionPolicy by simply specifying a 'shards' request parameter.
Using a user specified distribution policy (either as a param in the URL or
defined in the solrconfig as Upayavira suggested) is in the works as well.
Regarding that, I figure the priority for determining which policy to use
would be (highest to lowest):

1. Param in the URL
2. Specified in the solrconfig
3. Hard-coded default to fall back on

That way if a user changed their mind about which distribution policy they
wanted to use, they could override the default policy with their chosen one
as a request parameter.

The code has only been acceptance tested at the moment. There is a test
class but it's a bit messy, so once that's tidied up and improved a little
more I'll include it in the next patch.

> I haven't had time to follow all of this discussion, but this issue might
> help:

Thanks - very interesting! It's reassuring to see our implementation has
been following a similar structure.

There seem to be some nuances which we have yet to encounter/discover like
the way you've implemented the processCommit() method to wait for all the
adds/deletes to complete before sending the commits. Are these things which
you were aware of in advance that would need to be dealt with?


View raw message