lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From khartnjava <forpdfsend...@gmail.com>
Subject How to remove dublicates from Lucene index?
Date Sun, 09 Jan 2011 11:30:17 GMT

Hello.
I have index with 3 field - 
path_to_file - stored, not analyzed - unique path to file
file_content - stored, not analyzed - file's content
file_content_int - analyzed - file's content
How to find and delete dublicates in file_content field?
I have find http://open.vinayras.com/lucene_duplicate_remover 
but with lucene 3.x he don't work...
Please, sorry my English.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-remove-dublicates-from-Lucene-index-tp2220756p2220756.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message