lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ny1984 <nalanyun...@yahoo.com>
Subject DuplicateFilter Problem
Date Wed, 07 Jan 2009 12:19:54 GMT

Hi everyone,

I have a problem about Lucene DuplicateFilter. I have some PDF files and
have 3 field (id, title and content). I am indexing pdf files page by page.
Different pages on the same pdf stores same id and title, only content is
different. I want to search a string and eliminate the same id. But on some
documents DuplicateFilter runs perfect, but in some socumetns it returns 0
result. By the way if I search the string in title it again returns true
results, but if we search in content 0 results resturn. I have added my code
below. I could not find the problem. Please help me about the issue. Thank
you...

        String directory = "C:/indexes/";
        Query queryd = null;
        
        IndexReader = IndexReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(IndexReader);
        
        Analyzer sanalyzer = new StopAnalyzer();
        QueryParser parser = new QueryParser("content",sanalyzer);

        queryd = parser.parse("point");
        DuplicateFilter df = new   DuplicateFilter("id",1,1);
        ehits = searcher.search(queryd, df);

-- 
View this message in context: http://www.nabble.com/DuplicateFilter-Problem-tp21330217p21330217.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message