lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: No. of records mismatch
Date Mon, 17 Aug 2015 16:15:08 GMT
A couple of things:
1> Be a little careful looking at deletedDocs, maxDocs and numDocs
when you're done. Deleted (or updated) docs are "merged away" as
segments merge. deletedDocs isn't an count of all docs that _have_
been deleted, it's just a count of the docs that have been
delete/updated but not yet merged away.

2>  I do not see any deletions. This isn't a count of unique IDs
replaced, but the number of explicit deletions. Having that as 0
doesn't indicate that docs have been updated.

3> bq: "Yes it is not absolutely unique but do not think it is at this
1 to 6 ratio". Check your assumption here. Assuming this is a
database, select the count of whatever field maps to your <uniqueKey>.

4> Is this a sharded situation? It shouldn't matter, you should get a
full count unless you explicitly are adding &distrib=false, just
checking.

5> If none of that is the problem, let's see your config etc.

Best,
Erick

On Sun, Aug 16, 2015 at 11:57 PM, davidphilip cherian
<davidphilipcherian@gmail.com> wrote:
> Hi,
>
> You should check whether there were deletions by navigating to solr admin
> core admin page. Example url
> http://localhost:8983/solr/#/~cores/test_shard1_replica1, check for
> numDocs, maxDocs and deletedDocs. If numDocs remains equal to maxDocs, then
> you confirm that there were no updations (as recommended by Upayavira)
>
> HTH
>
> On Mon, Aug 17, 2015 at 4:41 AM, Pattabiraman, Meenakshisundaram <
> Pattabiraman.Meenakshisundaram@aig.com> wrote:
>
>> " You almost certainly have a non-unique ID field."
>> Yes it is not absolutely unique but do not think it is at this 1 to 6
>> ratio.
>>
>> "Try it with a clean index, and then review the number of deleted
>> documents (updates are a delete then insert action) "
>> I tried on a new instance - same effect. I do not see any deletions. Is
>> there a way to determine this from the logs to confirm that the behavior is
>> due to non-uniqueness? This will serve as an assurance.
>> Thanks
>>
>> <str name="Total Rows Fetched">6843469</str>
>> <str name="Total Documents Processed">6843469</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2015-08-16 21:22:24</str>
>> <str name="">
>> Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents.
>> </str>
>> <str name="Committed">2015-08-16 22:31:47</str>
>>
>> Whereas '*:*'
>>     "params":{
>>       "q":"*:*"}},
>>   "response":{"numFound":1143108,"start":0,"docs":[
>>
>> -----Original Message-----
>> From: Upayavira [mailto:uv@odoko.co.uk]
>> Sent: Sunday, August 16, 2015 3:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: No. of records mismatch
>>
>> You almost certainly have a non-unique ID field. Some documents are
>> overwritten during indexing. Try it with a clean index, and then review the
>> number of deleted documents (updates are a delete then insert action).
>> Deletes are calculated with maxDocs minus numDocs.
>>
>> Upayavira
>>
>> On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram
>> wrote:
>> > I did a dataimport with 'clean' set to false.
>> > The DIH status upon completion was:
>> >
>> > <str name="status">idle</str>
>> > <str name="importResponse"/>
>> > <lst name="statusMessages">
>> > <str name="Total Requests made to DataSource">1</str> <str name="Total
>> > Rows Fetched">6843427</str> <str name="Total Documents
>> > Processed">6843427</str> <str name="Total Documents Skipped">0</str>
>> > <str name="Full Dump Started">2015-08-16 16:50:54</str> <str
name="">
>> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0
>> > documents.
>> > </str>
>> > Whereas when I query using 'query?q=*:*&rows=0', I get the following
>> > count {
>> >   "responseHeader":{
>> >     "status":0,
>> >     "QTime":1,
>> >     "params":{
>> >       "q":"*:*",
>> >       "rows":"0"}},
>> >   "response":{"numFound":1616376,"start":0,"docs":[]
>> >   }}
>> >
>> > There is a difference of 5 million records. Can anyone help me
>> > understand the behavior? The logs look fine.
>> > Thanks
>>

Mime
View raw message