lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind <mili...@gmail.com>
Subject Re: Can't get case insensitive keyword analyzer to work
Date Tue, 12 Aug 2014 01:38:50 GMT
I found the problem.  But it makes no sense to me.

If I set the field type to be tokenized, it works.  But if I set it to not
be tokenized the search fails.  i.e. I have to pass in true to the method.
    theFieldType.setTokenized(storeTokenized);

I want the field to be stored as un-tokenized.  But it seems that I don't
need to do that.  The LowerCaseKeywordAnalyzer works if the field is
tokenized, but not if it's un-tokenized!

How can that be?


On Mon, Aug 11, 2014 at 1:49 PM, Milind <milindr@gmail.com> wrote:

> It does look like the lowercase is working.
>
> The following code
>
>         Document theDoc = theIndexReader.document(0);
>         System.out.println(theDoc.get("sn"));
>         IndexableField theField = theDoc.getField("sn");
>         TokenStream theTokenStream = theField.tokenStream(theAnalyzer);
>         System.out.println(theTokenStream);
>
> produces the following output
>     SN345-B21
>     LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62
> 32 31],startOffset=0,endOffset=9
>
> But the search does not work.  Anything obvious popping out for anyone?
>
>
> On Sat, Aug 9, 2014 at 4:39 PM, Milind <milindr@gmail.com> wrote:
>
>> I looked at a couple of examples on how to get keyword analyzer to be
>> case insensitive but I think I missed something since it's not working for
>> me.
>>
>> In the code below, I'm indexing text in upper case and searching in lower
>> case.  But I get back no hits.  Do I need to something more while
>> indexing?
>>
>>     private static class LowerCaseKeywordAnalyzer extends Analyzer
>>     {
>>         @Override
>>         protected TokenStreamComponents createComponents(String
>> theFieldName, Reader theReader)
>>         {
>>             KeywordTokenizer theTokenizer = new
>> KeywordTokenizer(theReader);
>>             TokenStreamComponents theTokenStreamComponents =
>>                 new TokenStreamComponents(
>>                         theTokenizer,
>>                         new LowerCaseFilter(Version.LUCENE_46,
>> theTokenizer));
>>             return theTokenStreamComponents;
>>         }
>>     }
>>
>>     private static void addDocment(IndexWriter theWriter,
>>                                       String theFieldName,
>>                                       String theValue,
>>                                       boolean storeTokenized)
>>         throws Exception
>>     {
>>           Document theDocument = new Document();
>>           FieldType theFieldType = new FieldType();
>>           theFieldType.setStored(true);
>>           theFieldType.setIndexed(true);
>>           theFieldType.setTokenized(storeTokenized);
>>           theDocument.add(new Field(theFieldName, theValue,
>> theFieldType));
>>           theWriter.addDocument(theDocument);
>>     }
>>
>>
>>     static void testLowerCaseKeywordAnalyzer()
>>         throws Exception
>>     {
>>         Version theVersion = Version.LUCENE_46;
>>         Directory theIndex = new RAMDirectory();
>>
>>         Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer();
>>
>>         IndexWriterConfig theConfig = new IndexWriterConfig(theVersion,
>>                                                             theAnalyzer);
>>         IndexWriter theWriter = new IndexWriter(theIndex, theConfig);
>>         addDocment(theWriter, "sn", "SN345-B21", false);
>>         addDocment(theWriter, "sn", "SN445-B21", false);
>>         theWriter.close();
>>
>>         QueryParser theParser = new QueryParser(theVersion, "sn",
>> theAnalyzer);
>>         Query theQuery = theParser.parse("sn:sn345-b21");
>>         IndexReader theIndexReader = DirectoryReader.open(theIndex);
>>         IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
>>         TopScoreDocCollector theCollector =
>> TopScoreDocCollector.create(10, true);
>>         theSearcher.search(theQuery, theCollector);
>>         ScoreDoc[] theHits = theCollector.topDocs().scoreDocs;
>>         System.out.println("Number of results found: " + theHits.length);
>>     }
>>
>> --
>> Regards
>> Milind
>>
>
>
>
> --
> Regards
> Milind
>



-- 
Regards
Milind

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message