lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norberto Meijome <>
Subject Re: Duplicate content
Date Tue, 15 Jul 2008 08:37:12 GMT
On Tue, 15 Jul 2008 13:15:41 +0530
"Sunil" <> wrote:

> 1) I don't want duplicate content.

SOLR uses the field you define as the unique field to determine whether a
document should be replaced or added. The rest of the fields are in your hands.
You could devise a setup whereby the document id is generated by hashing all
the other fields in your schema, thereby ensuring that a unique document id
means unique content (of course, for a meaning of 'uniqueness' that is
"different bytes" ;) )

> 2) I don't want to overwrite old content with new one. 
> Means, if I add duplicate content in solr and the content already
> exists, the old content should not be overwritten.

before inserting a new document, query the index - if you get a result back,
then don't insert. I don't know of any other way.

{Beto|Norberto|Numard} Meijome

"The real voyage of discovery consists not in seeking new landscapes, but in
having new eyes." Marcel Proust

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been

View raw message