lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vineet Mishra <clearmido...@gmail.com>
Subject Re: Inconsistent Behavior of Solr Cloud
Date Mon, 16 Jun 2014 08:06:09 GMT
Hi Erick,

Thanks for your response, well I got it resolved. I think the index were
not properly distributed and moreover I had some uneven behavior while
indexing, so to elaborate it,

I had three shards in my collection, I started indexing with
EmbeddedSolrServer and indexed around 50 Million Documents(15 GB index size
without replication), there after I indexed another 50 Million to different
directory for next Shard but when I checked the stats of indexing next
day(probably running after 15 hrs or so) it was still running and the index
size was grown to 60 GB(I didn't understood why such a huge disk allocation
had taken place even for the same amount of 50 Million data I indexed
previously), eventually I stopped the process as I couldn't get better
updates and copied the indexes to the next Shard.

#When I queried later with *:* I got the response as 69 Million
documents(which was supposed to be 100 Million).

##I am not sure where another 30 Million was gone, but the problem started
coming once after I again indexed to next Shard with remaining 30 Million
which was not coming in querying #.

I have read somewhere consistency of the cloud is broken if different
shards are holding the value for same UniqueID field.

With this I got few things to clarify.
*Does the inconsistency behavior was because of the step I took at ## ?
*If the inconsistency was because of ## then why all 100 Million documents
was not present after # ?
*When the same set of data was previously indexed with just 15 GB, why the
index size for next 50 Million was grown to 60 GB?
*For indexing huge data in reasonable time for SolrCloud what approach
should be taken, if EmbeddedSolrServer is not better choice?

Looking out for response.

Thanks!


On Sat, Jun 14, 2014 at 12:31 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> It seems like for some reason you have shards that are not reachable.
> What does your cloud stat in the admin UI tell you when you don't get
> all the docs back?
>
> Best,
> Erick
>
> On Fri, Jun 13, 2014 at 1:37 AM, Vineet Mishra <clearmidoubt@gmail.com>
> wrote:
> > Hi All,
> >
> > I am having a Cloud setup with 3 Shards and 2 Replica running on 3
> Tomcats
> > with 3 External Zookeeper, all running on single machine.
> > I have Indexed around 70 Mln Documents that seems to be querying back
> fine.
> > When I index another 30 Mln to same, the result are vague as with the
> query
> > *:* its sometimes returning 2 Shards result and sometime all the shards
> > result.
> > So to make it clear if I query with *:* to the 100Mln index its should
> > return back 100Mln docs, but sometimes its returning 70Mln and sometimes
> > 100Mln(Actual Result) with the same query.
> >
> > This is just not case with the *:* query but even if I query with the id
> >
> > q=id:123
> >
> > its sometimes coming with the result and sometimes not.
> >
> > Looking for possible solution.
> >
> > Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message