lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norberto Meijome <free...@meijome.net>
Subject Re: Duplicate content
Date Tue, 15 Jul 2008 08:37:12 GMT
On Tue, 15 Jul 2008 13:15:41 +0530
"Sunil" <sunil@truesparrow.com> wrote:

> 1) I don't want duplicate content.

SOLR uses the field you define as the unique field to determine whether a
document should be replaced or added. The rest of the fields are in your hands.
You could devise a setup whereby the document id is generated by hashing all
the other fields in your schema, thereby ensuring that a unique document id
means unique content (of course, for a meaning of 'uniqueness' that is
"different bytes" ;) )

> 2) I don't want to overwrite old content with new one. 
> 
> Means, if I add duplicate content in solr and the content already
> exists, the old content should not be overwritten.

before inserting a new document, query the index - if you get a result back,
then don't insert. I don't know of any other way.

b
_________________________
{Beto|Norberto|Numard} Meijome

"The real voyage of discovery consists not in seeking new landscapes, but in
having new eyes." Marcel Proust

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Mime
View raw message