lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Configuring the Distributed
Date Sat, 03 Dec 2011 03:59:27 GMT
So I just tried this out, seems like it does the things I asked about.

Really really cool stuff, it's progressed quite a bit in the time
since I took a snapshot of the branch.

Last question, how do you change numShards?  Is there a command you
can use to do this now? I understand there will be implications for
the hashing algorithm, but once the hash ranges are stored in ZK (is
there a separate JIRA for this or does this fall under 2358) I assume
that it would be a relatively simple index split (JIRA 2595?) and
updating the hash ranges in solr, essentially splitting the range
between the new and existing shard.  Is that right?

On Fri, Dec 2, 2011 at 10:08 PM, Jamie Johnson <jej2003@gmail.com> wrote:
> I think I see it.....so if I understand this correctly you specify
> numShards as a system property, as new nodes come up they check ZK to
> see if they should be a new shard or a replica based on if numShards
> is met.  A few questions if a master goes down does a replica get
> promoted?  If a new shard needs to be added is it just a matter of
> starting a new solr instance with a higher numShards?  (understanding
> that index rebalancing does not happen automatically now, but
> presumably it could).
>
> On Fri, Dec 2, 2011 at 9:56 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>> How does it determine the number of shards to create?  How many
>> replicas to create?
>>
>> On Fri, Dec 2, 2011 at 4:30 PM, Mark Miller <markrmiller@gmail.com> wrote:
>>> Ah, okay - you are setting the shards in solr.xml - thats still an option
>>> to force a node to a particular shard - but if you take that out, shards
>>> will be auto assigned.
>>>
>>> By the way, because of the version code, distrib deletes don't work at the
>>> moment - will get to that next week.
>>>
>>> - Mark
>>>
>>> On Fri, Dec 2, 2011 at 1:16 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>>
>>>> So I'm a fool.  I did set the numShards, the issue was so trivial it's
>>>> embarrassing.  I did indeed have it setup as a replica, the shard
>>>> names in solr.xml were both shard1.  This worked as I expected now.
>>>>
>>>> On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmiller@gmail.com>
wrote:
>>>> >
>>>> > They are unused params, so removing them wouldn't help anything.
>>>> >
>>>> > You might just want to wait till we are further along before playing
>>>> with it.
>>>> >
>>>> > Or if you submit your full self contained test, I can see what's going
>>>> on (eg its still unclear if you have started setting numShards?).
>>>> >
>>>> > I can do a similar set of actions in my tests and it works fine. The
>>>> only reason I could see things working like this is if it thinks you have
>>>> one shard - a leader and a replica.
>>>> >
>>>> > - Mark
>>>> >
>>>> > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote:
>>>> >
>>>> >> Glad to hear I don't need to set shards/self, but removing them
didn't
>>>> >> seem to change what I'm seeing.  Doing this still results in 2
>>>> >> documents 1 on 8983 and 1 on 7574.
>>>> >>
>>>> >> String key = "1";
>>>> >>
>>>> >>               SolrInputDocument solrDoc = new SolrInputDocument();
>>>> >>               solrDoc.setField("key", key);
>>>> >>
>>>> >>               solrDoc.addField("content_mvtxt", "initial
value");
>>>> >>
>>>> >>               SolrServer server = servers.get("
>>>> http://localhost:8983/solr/collection1");
>>>> >>
>>>> >>               UpdateRequest ureq = new UpdateRequest();
>>>> >>               ureq.setParam("update.chain", "distrib-update-chain");
>>>> >>               ureq.add(solrDoc);
>>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
>>>> >>               server.request(ureq);
>>>> >>               server.commit();
>>>> >>
>>>> >>               solrDoc = new SolrInputDocument();
>>>> >>               solrDoc.addField("key", key);
>>>> >>               solrDoc.addField("content_mvtxt", "updated
value");
>>>> >>
>>>> >>               server = servers.get("
>>>> http://localhost:7574/solr/collection1");
>>>> >>
>>>> >>               ureq = new UpdateRequest();
>>>> >>               ureq.setParam("update.chain", "distrib-update-chain");
>>>> >>               ureq.add(solrDoc);
>>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
>>>> >>               server.request(ureq);
>>>> >>               server.commit();
>>>> >>
>>>> >>               server = servers.get("
>>>> http://localhost:8983/solr/collection1");
>>>> >>
>>>> >>
>>>> >>               server.commit();
>>>> >>               System.out.println("done");
>>>> >>
>>>> >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller <markrmiller@gmail.com>
>>>> wrote:
>>>> >>> So I dunno. You are running a zk server and running in zk mode
right?
>>>> >>>
>>>> >>> You don't need to / shouldn't set a shards or self param. The
shards
>>>> are
>>>> >>> figured out from Zookeeper.
>>>> >>>
>>>> >>> You always want to use the distrib-update-chain. Eventually
it will
>>>> >>> probably be part of the default chain and auto turn in zk mode.
>>>> >>>
>>>> >>> If you are running in zk mode attached to a zk server, this
should
>>>> work no
>>>> >>> problem. You can add docs to any server and they will be forwarded
to
>>>> the
>>>> >>> correct shard leader and then versioned and forwarded to replicas.
>>>> >>>
>>>> >>> You can also use the CloudSolrServer solrj client - that way
you don't
>>>> even
>>>> >>> have to choose a server to send docs too - in which case if
it went
>>>> down
>>>> >>> you would have to choose another manually - CloudSolrServer
>>>> automatically
>>>> >>> finds one that is up through ZooKeeper. Eventually it will also
be
>>>> smart
>>>> >>> and do the hashing itself so that it can send directly to the
shard
>>>> leader
>>>> >>> that the doc would be forwarded to anyway.
>>>> >>>
>>>> >>> - Mark
>>>> >>>
>>>> >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <jej2003@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>>> Really just trying to do a simple add and update test, the
chain
>>>> >>>> missing is just proof of my not understanding exactly how
this is
>>>> >>>> supposed to work.  I modified the code to this
>>>> >>>>
>>>> >>>>                String key = "1";
>>>> >>>>
>>>> >>>>                SolrInputDocument solrDoc = new SolrInputDocument();
>>>> >>>>                solrDoc.setField("key", key);
>>>> >>>>
>>>> >>>>                 solrDoc.addField("content_mvtxt",
"initial value");
>>>> >>>>
>>>> >>>>                SolrServer server = servers
>>>> >>>>                                .get("
>>>> >>>> http://localhost:8983/solr/collection1");
>>>> >>>>
>>>> >>>>                 UpdateRequest ureq = new UpdateRequest();
>>>> >>>>                ureq.setParam("update.chain", "distrib-update-chain");
>>>> >>>>                ureq.add(solrDoc);
>>>> >>>>                ureq.setParam("shards",
>>>> >>>>
>>>> >>>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>>>> >>>>                ureq.setParam("self", "foo");
>>>> >>>>                ureq.setAction(ACTION.COMMIT, true,
true);
>>>> >>>>                server.request(ureq);
>>>> >>>>                 server.commit();
>>>> >>>>
>>>> >>>>                solrDoc = new SolrInputDocument();
>>>> >>>>                solrDoc.addField("key", key);
>>>> >>>>                 solrDoc.addField("content_mvtxt",
"updated value");
>>>> >>>>
>>>> >>>>                server = servers.get("
>>>> >>>> http://localhost:7574/solr/collection1");
>>>> >>>>
>>>> >>>>                 ureq = new UpdateRequest();
>>>> >>>>                ureq.setParam("update.chain", "distrib-update-chain");
>>>> >>>>                 //
>>>> ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285");
>>>> >>>>                 ureq.add(solrDoc);
>>>> >>>>                ureq.setParam("shards",
>>>> >>>>
>>>> >>>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>>>> >>>>                ureq.setParam("self", "foo");
>>>> >>>>                ureq.setAction(ACTION.COMMIT, true,
true);
>>>> >>>>                server.request(ureq);
>>>> >>>>                 // server.add(solrDoc);
>>>> >>>>                server.commit();
>>>> >>>>                server = servers.get("
>>>> >>>> http://localhost:8983/solr/collection1");
>>>> >>>>
>>>> >>>>
>>>> >>>>                server.commit();
>>>> >>>>                System.out.println("done");
>>>> >>>>
>>>> >>>> but I'm still seeing the doc appear on both shards.    After
the first
>>>> >>>> commit I see the doc on 8983 with "initial value".  after
the second
>>>> >>>> commit I see the updated value on 7574 and the old on 8983.
 After the
>>>> >>>> final commit the doc on 8983 gets updated.
>>>> >>>>
>>>> >>>> Is there something wrong with my test?
>>>> >>>>
>>>> >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller <markrmiller@gmail.com>
>>>> >>>> wrote:
>>>> >>>>> Getting late - didn't really pay attention to your code
I guess - why
>>>> >>>> are you adding the first doc without specifying the distrib
update
>>>> chain?
>>>> >>>> This is not really supported. It's going to just go to the
server you
>>>> >>>> specified - even with everything setup right, the update
might then
>>>> go to
>>>> >>>> that same server or the other one depending on how it hashes.
You
>>>> really
>>>> >>>> want to just always use the distrib update chain.  I guess
I don't yet
>>>> >>>> understand what you are trying to test.
>>>> >>>>>
>>>> >>>>> Sent from my iPad
>>>> >>>>>
>>>> >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <markrmiller@gmail.com>
>>>> wrote:
>>>> >>>>>
>>>> >>>>>> Not sure offhand - but things will be funky if you
don't specify the
>>>> >>>> correct numShards.
>>>> >>>>>>
>>>> >>>>>> The instance to shard assignment should be using
numShards to
>>>> assign.
>>>> >>>> But then the hash to shard mapping actually goes on the
number of
>>>> shards it
>>>> >>>> finds registered in ZK (it doesn't have to, but really these
should be
>>>> >>>> equal).
>>>> >>>>>>
>>>> >>>>>> So basically you are saying, I want 3 partitions,
but you are only
>>>> >>>> starting up 2 nodes, and the code is just not happy about
that I'd
>>>> guess.
>>>> >>>> For the system to work properly, you have to fire up at
least as many
>>>> >>>> servers as numShards.
>>>> >>>>>>
>>>> >>>>>> What are you trying to do? 2 partitions with no
replicas, or one
>>>> >>>> partition with one replica?
>>>> >>>>>>
>>>> >>>>>> In either case, I think you will have better luck
if you fire up at
>>>> >>>> least as many servers as the numShards setting. Or lower
the numShards
>>>> >>>> setting.
>>>> >>>>>>
>>>> >>>>>> This is all a work in progress by the way - what
you are trying to
>>>> test
>>>> >>>> should work if things are setup right though.
>>>> >>>>>>
>>>> >>>>>> - Mark
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote:
>>>> >>>>>>
>>>> >>>>>>> Thanks for the quick response.  With that change
(have not done
>>>> >>>>>>> numShards yet) shard1 got updated.  But now
when executing the
>>>> >>>>>>> following queries I get information back from
both, which doesn't
>>>> seem
>>>> >>>>>>> right
>>>> >>>>>>>
>>>> >>>>>>> http://localhost:7574/solr/select/?q=*:*
>>>> >>>>>>> <doc><str name="key">1</str><str
name="content_mvtxt">updated
>>>> >>>> value</str></doc>
>>>> >>>>>>>
>>>> >>>>>>> http://localhost:8983/solr/select?q=*:*
>>>> >>>>>>> <doc><str name="key">1</str><str
name="content_mvtxt">updated
>>>> >>>> value</str></doc>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller
<
>>>> markrmiller@gmail.com>
>>>> >>>> wrote:
>>>> >>>>>>>> Hmm...sorry bout that - so my first guess
is that right now we are
>>>> >>>> not distributing a commit (easy to add, just have not done
it).
>>>> >>>>>>>>
>>>> >>>>>>>> Right now I explicitly commit on each server
for tests.
>>>> >>>>>>>>
>>>> >>>>>>>> Can you try explicitly committing on server1
after updating the
>>>> doc
>>>> >>>> on server 2?
>>>> >>>>>>>>
>>>> >>>>>>>> I can start distributing commits tomorrow
- been meaning to do it
>>>> for
>>>> >>>> my own convenience anyhow.
>>>> >>>>>>>>
>>>> >>>>>>>> Also, you want to pass the sys property
numShards=1 on startup. I
>>>> >>>> think it defaults to 3. That will give you one leader and
one replica.
>>>> >>>>>>>>
>>>> >>>>>>>> - Mark
>>>> >>>>>>>>
>>>> >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson
wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> So I couldn't resist, I attempted to
do this tonight, I used the
>>>> >>>>>>>>> solrconfig you mentioned (as is, no
modifications), I setup a 2
>>>> shard
>>>> >>>>>>>>> cluster in collection1, I sent 1 doc
to 1 of the shards, updated
>>>> it
>>>> >>>>>>>>> and sent the update to the other.  I
don't see the modifications
>>>> >>>>>>>>> though I only see the original document.
 The following is the
>>>> test
>>>> >>>>>>>>>
>>>> >>>>>>>>> public void update() throws Exception
{
>>>> >>>>>>>>>
>>>> >>>>>>>>>              String key = "1";
>>>> >>>>>>>>>
>>>> >>>>>>>>>              SolrInputDocument
solrDoc = new SolrInputDocument();
>>>> >>>>>>>>>              solrDoc.setField("key",
key);
>>>> >>>>>>>>>
>>>> >>>>>>>>>              solrDoc.addField("content",
"initial value");
>>>> >>>>>>>>>
>>>> >>>>>>>>>              SolrServer server
= servers
>>>> >>>>>>>>>                        
     .get("
>>>> >>>> http://localhost:8983/solr/collection1");
>>>> >>>>>>>>>              server.add(solrDoc);
>>>> >>>>>>>>>
>>>> >>>>>>>>>              server.commit();
>>>> >>>>>>>>>
>>>> >>>>>>>>>              solrDoc = new SolrInputDocument();
>>>> >>>>>>>>>              solrDoc.addField("key",
key);
>>>> >>>>>>>>>              solrDoc.addField("content",
"updated value");
>>>> >>>>>>>>>
>>>> >>>>>>>>>              server = servers.get("
>>>> >>>> http://localhost:7574/solr/collection1");
>>>> >>>>>>>>>
>>>> >>>>>>>>>              UpdateRequest ureq
= new UpdateRequest();
>>>> >>>>>>>>>              ureq.setParam("update.chain",
>>>> "distrib-update-chain");
>>>> >>>>>>>>>              ureq.add(solrDoc);
>>>> >>>>>>>>>              ureq.setParam("shards",
>>>> >>>>>>>>>
>>>> >>>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>>>> >>>>>>>>>              ureq.setParam("self",
"foo");
>>>> >>>>>>>>>              ureq.setAction(ACTION.COMMIT,
true, true);
>>>> >>>>>>>>>              server.request(ureq);
>>>> >>>>>>>>>              System.out.println("done");
>>>> >>>>>>>>>      }
>>>> >>>>>>>>>
>>>> >>>>>>>>> key is my unique field in schema.xml
>>>> >>>>>>>>>
>>>> >>>>>>>>> What am I doing wrong?
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie
Johnson <jej2003@gmail.com
>>>> >
>>>> >>>> wrote:
>>>> >>>>>>>>>> Yes, the ZK method seems much more
flexible.  Adding a new shard
>>>> >>>> would
>>>> >>>>>>>>>> be simply updating the range assignments
in ZK.  Where is this
>>>> >>>>>>>>>> currently on the list of things
to accomplish?  I don't have
>>>> time to
>>>> >>>>>>>>>> work on this now, but if you (or
anyone) could provide
>>>> direction I'd
>>>> >>>>>>>>>> be willing to work on this when
I had spare time.  I guess a
>>>> JIRA
>>>> >>>>>>>>>> detailing where/how to do this could
help.  Not sure if the
>>>> design
>>>> >>>> has
>>>> >>>>>>>>>> been thought out that far though.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM,
Mark Miller <
>>>> markrmiller@gmail.com>
>>>> >>>> wrote:
>>>> >>>>>>>>>>> Right now lets say you have
one shard - everything there
>>>> hashes to
>>>> >>>> range X.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Now you want to split that shard
with an Index Splitter.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> You divide range X in two -
giving you two ranges - then you
>>>> start
>>>> >>>> splitting. This is where the current Splitter needs a little
>>>> modification.
>>>> >>>> You decide which doc should go into which new index by rehashing
each
>>>> doc
>>>> >>>> id in the index you are splitting - if its hash is greater
than X/2,
>>>> it
>>>> >>>> goes into index1 - if its less, index2. I think there are
a couple
>>>> current
>>>> >>>> Splitter impls, but one of them does something like, give
me an id -
>>>> now if
>>>> >>>> the id's in the index are above that id, goto index1, if
below,
>>>> index2. We
>>>> >>>> need to instead do a quick hash rather than simple id compare.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Why do you need to do this on
every shard?
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> The other part we need that
we dont have is to store hash range
>>>> >>>> assignments in zookeeper - we don't do that yet because
it's not
>>>> needed
>>>> >>>> yet. Instead we currently just simply calculate that on
the fly (too
>>>> often
>>>> >>>> at the moment - on every request :) I intend to fix that
of course).
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> At the start, zk would say,
for range X, goto this shard. After
>>>> >>>> the split, it would say, for range less than X/2 goto the
old node,
>>>> for
>>>> >>>> range greater than X/2 goto the new node.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> - Mark
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM,
Jamie Johnson wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> hmmm.....This doesn't sound
like the hashing algorithm that's
>>>> on
>>>> >>>> the
>>>> >>>>>>>>>>>> branch, right?  The algorithm
you're mentioning sounds like
>>>> there
>>>> >>>> is
>>>> >>>>>>>>>>>> some logic which is able
to tell that a particular range
>>>> should be
>>>> >>>>>>>>>>>> distributed between 2 shards
instead of 1.  So seems like a
>>>> trade
>>>> >>>> off
>>>> >>>>>>>>>>>> between repartitioning the
entire index (on every shard) and
>>>> >>>> having a
>>>> >>>>>>>>>>>> custom hashing algorithm
which is able to handle the situation
>>>> >>>> where 2
>>>> >>>>>>>>>>>> or more shards map to a
particular range.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34
PM, Mark Miller <
>>>> >>>> markrmiller@gmail.com> wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> On Dec 1, 2011, at 7:20
PM, Jamie Johnson wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> I am not familiar
with the index splitter that is in
>>>> contrib,
>>>> >>>> but I'll
>>>> >>>>>>>>>>>>>> take a look at it
soon.  So the process sounds like it
>>>> would be
>>>> >>>> to run
>>>> >>>>>>>>>>>>>> this on all of the
current shards indexes based on the hash
>>>> >>>> algorithm.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Not something I've thought
deeply about myself yet, but I
>>>> think
>>>> >>>> the idea would be to split as many as you felt you needed
to.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> If you wanted to keep
the full balance always, this would
>>>> mean
>>>> >>>> splitting every shard at once, yes. But this depends on
how many boxes
>>>> >>>> (partitions) you are willing/able to add at a time.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> You might just split
one index to start - now it's hash range
>>>> >>>> would be handled by two shards instead of one (if you have
3 replicas
>>>> per
>>>> >>>> shard, this would mean adding 3 more boxes). When you needed
to expand
>>>> >>>> again, you would split another index that was still handling
its full
>>>> >>>> starting range. As you grow, once you split every original
index,
>>>> you'd
>>>> >>>> start again, splitting one of the now half ranges.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Is there also an
index merger in contrib which could be
>>>> used to
>>>> >>>> merge
>>>> >>>>>>>>>>>>>> indexes?  I'm assuming
this would be the process?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes
(Solr also has an
>>>> >>>> admin command that can do this). But I'm not sure where
this fits in?
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> - Mark
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> On Thu, Dec 1, 2011
at 7:18 PM, Mark Miller <
>>>> >>>> markrmiller@gmail.com> wrote:
>>>> >>>>>>>>>>>>>>> Not yet - we
don't plan on working on this until a lot of
>>>> >>>> other stuff is
>>>> >>>>>>>>>>>>>>> working solid
at this point. But someone else could jump
>>>> in!
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> There are a
couple ways to go about it that I know of:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> A more long
term solution may be to start using micro
>>>> shards -
>>>> >>>> each index
>>>> >>>>>>>>>>>>>>> starts as multiple
indexes. This makes it pretty fast to
>>>> move
>>>> >>>> mirco shards
>>>> >>>>>>>>>>>>>>> around as you
decide to change partitions. It's also less
>>>> >>>> flexible as you
>>>> >>>>>>>>>>>>>>> are limited
by the number of micro shards you start with.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> A more simple
and likely first step is to use an index
>>>> >>>> splitter . We
>>>> >>>>>>>>>>>>>>> already have
one in lucene contrib - we would just need to
>>>> >>>> modify it so
>>>> >>>>>>>>>>>>>>> that it splits
based on the hash of the document id. This
>>>> is
>>>> >>>> super
>>>> >>>>>>>>>>>>>>> flexible, but
splitting will obviously take a little while
>>>> on
>>>> >>>> a huge index.
>>>> >>>>>>>>>>>>>>> The current
index splitter is a multi pass splitter - good
>>>> >>>> enough to start
>>>> >>>>>>>>>>>>>>> with, but most
files under codec control these days, we
>>>> may be
>>>> >>>> able to make
>>>> >>>>>>>>>>>>>>> a single pass
splitter soon as well.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Eventually you
could imagine using both options - micro
>>>> shards
>>>> >>>> that could
>>>> >>>>>>>>>>>>>>> also be split
as needed. Though I still wonder if micro
>>>> shards
>>>> >>>> will be
>>>> >>>>>>>>>>>>>>> worth the extra
complications myself...
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Right now though,
the idea is that you should pick a good
>>>> >>>> number of
>>>> >>>>>>>>>>>>>>> partitions to
start given your expected data ;) Adding more
>>>> >>>> replicas is
>>>> >>>>>>>>>>>>>>> trivial though.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> - Mark
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On Thu, Dec
1, 2011 at 6:35 PM, Jamie Johnson <
>>>> >>>> jej2003@gmail.com> wrote:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Another
question, is there any support for repartitioning
>>>> of
>>>> >>>> the index
>>>> >>>>>>>>>>>>>>>> if a new
shard is added?  What is the recommended
>>>> approach for
>>>> >>>>>>>>>>>>>>>> handling
this?  It seemed that the hashing algorithm (and
>>>> >>>> probably
>>>> >>>>>>>>>>>>>>>> any) would
require the index to be repartitioned should a
>>>> new
>>>> >>>> shard be
>>>> >>>>>>>>>>>>>>>> added.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> On Thu,
Dec 1, 2011 at 6:32 PM, Jamie Johnson <
>>>> >>>> jej2003@gmail.com> wrote:
>>>> >>>>>>>>>>>>>>>>> Thanks
I will try this first thing in the morning.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> On Thu,
Dec 1, 2011 at 3:39 PM, Mark Miller <
>>>> >>>> markrmiller@gmail.com>
>>>> >>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>
On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <
>>>> >>>> jej2003@gmail.com>
>>>> >>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>
I am currently looking at the latest solrcloud branch
>>>> and
>>>> >>>> was
>>>> >>>>>>>>>>>>>>>>>>>
wondering if there was any documentation on
>>>> configuring the
>>>> >>>>>>>>>>>>>>>>>>>
DistributedUpdateProcessor?  What specifically in
>>>> >>>> solrconfig.xml needs
>>>> >>>>>>>>>>>>>>>>>>>
to be added/modified to make distributed indexing work?
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
Hi Jaime - take a look at solrconfig-distrib-update.xml
>>>> in
>>>> >>>>>>>>>>>>>>>>>>
solr/core/src/test-files
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
You need to enable the update log, add an empty
>>>> replication
>>>> >>>> handler def,
>>>> >>>>>>>>>>>>>>>>>>
and an update chain with
>>>> >>>> solr.DistributedUpdateProcessFactory in it.
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
--
>>>> >>>>>>>>>>>>>>>>>>
- Mark
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>
http://www.lucidimagination.com
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>>>> - Mark
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> http://www.lucidimagination.com
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> - Mark Miller
>>>> >>>>>>>>>>>>> lucidimagination.com
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> - Mark Miller
>>>> >>>>>>>>>>> lucidimagination.com
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> - Mark Miller
>>>> >>>>>>>> lucidimagination.com
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>
>>>> >>>>>> - Mark Miller
>>>> >>>>>> lucidimagination.com
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> - Mark
>>>> >>>
>>>> >>> http://www.lucidimagination.com
>>>> >>>
>>>> >
>>>> > - Mark Miller
>>>> > lucidimagination.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com

Mime
View raw message