lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Selvia <ssel...@gmail.com>
Subject Re: Exact Phrase Search returning in correct results
Date Wed, 11 Jun 2014 17:04:18 GMT
Thank you will give it a try

On Jun 11, 2014, at 12:58 PM, Allison, Timothy B. <tallison@mitre.org> wrote:

> StandardAnalyzer with that configuration drops stop words at both index and search time.
 So, in effect, you really are just searching for "becomes".  If your use case requires you
to be able to search stop words consider  adding CharArraySet.EMPTY_SET to the StandardAnalyzer's
initializer.
> 
> 
> 
> -----Original Message-----
> From: Scott Selvia [mailto:sselvia@gmail.com] 
> Sent: Wednesday, June 11, 2014 12:48 PM
> To: java-user@lucene.apache.org
> Subject: Exact Phrase Search returning in correct results
> 
> I'm having an issue searching for an exact phrase with Lucene 4.7.  My use case loaded
the Declaration of Independence into 
> a Lucene search database.  I search for "it becomes" and I get two hits; one for "it,
becomes" and another for a line that just has
> "becomes" at the end of the line.
> 
> Expected:
> 
> "When, in the course of human events, it becomes necessary for one people to dissolve
the"
> 
> Not Expected:
> 
> "powers from the consent of the governed. That whenever any form of government becomes"
> 
> Below is my load code and search code:
> 
> Directory idxLinesDir = FSDirectory.open("test lucene index");
> Analyzer analyzerLines = new StandardAnalyzer(Version.LUCENE_47);
> IndexWriterConfig iwcLines = new IndexWriterConfig(Version.LUCENE_47, analyzerLines);
> iwcLines.setOpenMode((idxLinesFile.exists()) ? IndexWriterConfig.OpenMode.CREATE_OR_APPEND
: IndexWriterConfig.OpenMode.CREATE);
> 
> IndexWriter writerLines = new IndexWriter(idxLinesDir, iwcLines);
> 
> for (int i = 0; i < arrayListOfLines.size(); i++)
> {
>     Document docLine = new Document();
>     docLine.add(new StringField("docIndex", String.format("%06d", pageNumber) + ":" +
String.format("%06d", i), Field.Store.YES));
>     docLine.add(new TextField("lineText", arrayListOfLines.get(i), Field.Store.YES));
> 
>     writerLines.addDocument(docLines);
> }
> 
> // Search Code
> 
> Directory idxDir = FSDirectory.open(idxFile);
> IndexReader reader = DirectoryReader.open(idxDir);
> IndexSearcher searcher = new IndexSearcher(reader);
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
> QueryParser parser = new QueryParser(Version.LUCENE_47, "lineText", analyzer);
> parser.setDefaultOperator(QueryParser.AND_OPERATOR);
> parser.setPhraseSlop(0);
> 
> Query query = parser.createPhraseQuery("lineText", "it becomes");                
> TotalHitCountCollector collector = new TotalHitCountCollector();
> searcher.search(query, collector);
> TopDocs results = searcher.search(query, Math.max(1, collector.getTotalHits()));
> ScoreDoc[] hits = results.scoreDocs;
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message