lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrCloud and not all shards having an instance, indexing still works (kinda).
Date Sun, 11 Nov 2012 20:14:26 GMT
That's not quite what I'm seeing. Just doing the Wiki solrcloud example, I
just started the first (bootstrap) example and then tried to index all the
docs in the exampledocs directory. At the end of that, I had 9 docs in my
collection1.

Now bring up the second shard and index them all again. At this point, the
first shard now has 18 docs in it. Which is the count if you do it right
the first time.

I tried going through the xml files and posting them one-by-one and
checking counts and I started to  suspect that the docs in each xml file
are getting indexed up until the first doc that doesn't hash into shard1
and then erroring out.

Note, I'm not sure this is something we should "fix", since if I bring up
the second shard and re-index _everything_ the index is consistent.

If we _were_ to change behavior, I should think it'd be something like
"check that all the shards are reachable and fail immediately without
indexing anything if not". Or "keep indexing docs even if some fail". Or
maybe "rollback on error".

But  I don't think I particularly like that "fix" since something is better
than nothing. So false alarm I guess. I was being thrown by the change in
the number of docs in shard1 when I brought the second shard up and
re-indexed...



On Sun, Nov 11, 2012 at 11:22 AM, Mark Miller <markrmiller@gmail.com> wrote:

> If a shard is down, I think some docs still index? The ones that hash
> to the shard that is up. Approx half should fail give or take.
>
> - Mark
>
> On Sun, Nov 11, 2012 at 11:06 AM, Erick Erickson
> <erickerickson@gmail.com> wrote:
> > Should have said that this is a 4x build from this morning (11-Nov)
> >
> >
> > On Sun, Nov 11, 2012 at 11:05 AM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> >>
> >> Sorry, I'm a bit under the gun so can't look over JIRAs as carefully as
> >> I'd like. But this seems odd.
> >>
> >> Setup:
> >> Start a 2-shard SolrCloud setup, with (straight from the example)
> >> java -Dbootstrap_confdir=./solr/collection1/conf
> >> -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
> >>
> >> DO NOT start any other instances.
> >>
> >> Now go to exampleDocs and index everything. I see a lot of output like:
> >> POSTing file money.xml
> >> SimplePostTool: WARNING: Solr returned an error #503 Service Unavailable
> >> SimplePostTool: WARNING: IOException while reading response:
> >> java.io.IOException: Server returned HTTP response code: 503 for URL:
> >> http://localhost:8983/solr/update
> >>
> >> which is fine, half my cluster isn't there..
> >>
> >> Trying to query on the collection returns errors, also fine.
> >>
> >> What's surprising is that when I look at the admin page for the
> >> collection, it shows 9 documents successfully indexed.
> >>
> >> If I shut down the cloud instance and started a plain old (not
> SolrCloud)
> >> instance, there were 9 documents in my index.
> >>
> >> Is this intended behavior or should I raise a JIRA?
> >>
> >> Thanks,
> >> Erick
> >
> >
>
>
>
> --
> - Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message