lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind <mili...@gmail.com>
Subject Re: Can't get case insensitive keyword analyzer to work
Date Mon, 11 Aug 2014 17:49:44 GMT
It does look like the lowercase is working.

The following code

        Document theDoc = theIndexReader.document(0);
        System.out.println(theDoc.get("sn"));
        IndexableField theField = theDoc.getField("sn");
        TokenStream theTokenStream = theField.tokenStream(theAnalyzer);
        System.out.println(theTokenStream);

produces the following output
    SN345-B21
    LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62 32
31],startOffset=0,endOffset=9

But the search does not work.  Anything obvious popping out for anyone?


On Sat, Aug 9, 2014 at 4:39 PM, Milind <milindr@gmail.com> wrote:

> I looked at a couple of examples on how to get keyword analyzer to be case
> insensitive but I think I missed something since it's not working for me.
>
> In the code below, I'm indexing text in upper case and searching in lower
> case.  But I get back no hits.  Do I need to something more while
> indexing?
>
>     private static class LowerCaseKeywordAnalyzer extends Analyzer
>     {
>         @Override
>         protected TokenStreamComponents createComponents(String
> theFieldName, Reader theReader)
>         {
>             KeywordTokenizer theTokenizer = new
> KeywordTokenizer(theReader);
>             TokenStreamComponents theTokenStreamComponents =
>                 new TokenStreamComponents(
>                         theTokenizer,
>                         new LowerCaseFilter(Version.LUCENE_46,
> theTokenizer));
>             return theTokenStreamComponents;
>         }
>     }
>
>     private static void addDocment(IndexWriter theWriter,
>                                       String theFieldName,
>                                       String theValue,
>                                       boolean storeTokenized)
>         throws Exception
>     {
>           Document theDocument = new Document();
>           FieldType theFieldType = new FieldType();
>           theFieldType.setStored(true);
>           theFieldType.setIndexed(true);
>           theFieldType.setTokenized(storeTokenized);
>           theDocument.add(new Field(theFieldName, theValue, theFieldType));
>           theWriter.addDocument(theDocument);
>     }
>
>
>     static void testLowerCaseKeywordAnalyzer()
>         throws Exception
>     {
>         Version theVersion = Version.LUCENE_46;
>         Directory theIndex = new RAMDirectory();
>
>         Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer();
>
>         IndexWriterConfig theConfig = new IndexWriterConfig(theVersion,
>                                                             theAnalyzer);
>         IndexWriter theWriter = new IndexWriter(theIndex, theConfig);
>         addDocment(theWriter, "sn", "SN345-B21", false);
>         addDocment(theWriter, "sn", "SN445-B21", false);
>         theWriter.close();
>
>         QueryParser theParser = new QueryParser(theVersion, "sn",
> theAnalyzer);
>         Query theQuery = theParser.parse("sn:sn345-b21");
>         IndexReader theIndexReader = DirectoryReader.open(theIndex);
>         IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
>         TopScoreDocCollector theCollector =
> TopScoreDocCollector.create(10, true);
>         theSearcher.search(theQuery, theCollector);
>         ScoreDoc[] theHits = theCollector.topDocs().scoreDocs;
>         System.out.println("Number of results found: " + theHits.length);
>     }
>
> --
> Regards
> Milind
>



-- 
Regards
Milind

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message