lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Артур Хуснутдинов <forpdfsend...@gmail.com>
Subject How to remove dublicates from Lucene index?
Date Mon, 10 Jan 2011 16:08:34 GMT
Hello.
I have index with 3 field -
path_to_file - stored, not analyzed - unique path to file
file_content - stored, not analyzed - file's content
file_content_int - analyzed - file's content
How to find and delete dublicates in file_content field?
I have find http://open.vinayras.com/lucene_duplicate_remover
but with lucene 3.x he don't work...
Please, sorry my English.

-- 
С уважением,. ArtUrlWWW

Mime
View raw message