lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Duplicate Documents
Date Sat, 12 Sep 2015 17:48:20 GMT
On 9/12/2015 10:51 AM, Mr Havercamp wrote:
> Unfortunately, <uniqueKey/> has never changed. The issue can take some time
> to show itself although I think there were logic issues with the way I
> update documents in my index.
> 
> I first do a full purge and reindex of all items without issue.
> 
> Over time, I only index items that have changed/are new since initial
> reindex. However, I start to see duplicates appear which is strange becuase
> I use a combination of <uniqueKey/> plus overwrite="true" which should
> guarantee uniqueness.
> 
> However, I have been using the /admin/luke lastModified date to check for
> items which have been added/updated after this date but have just realized
> that lastModified will only change if I a) reindex everything or b) call
> optimize, so I have been retrieving items which have already been added to
> the index. I think explicitly storing the last run time (in a file/db
> field) will ensure I only retrieve those items which have changed since the
> last index. This will also go a long way to solving the duplication issue.

Solr will already overwrite when the uniqueKey matches (case sensitive),
you do not need to tell it explicitly to do it.  Virtually all
situations when people use the overwrite parameter, they are specifying
"false" ... so I wonder if perhaps there's a bug when it is explicitly
set to "true".  Can you do a full purge and reindex with the overwrite
parameter removed from all requests?

The XMLLoader code looks pretty straightforward, so I don't really
expect that removing the overwrite parameter will help, but like Erick,
I cannot see any obvious problem in the info you've shared so far.  I'm
trying shots in the dark.

Thanks,
Shawn


Mime
View raw message