lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kaser <christoph.ka...@iconparc.de>
Subject Re: Can't get case insensitive keyword analyzer to work
Date Tue, 12 Aug 2014 07:07:57 GMT
Hello Milind,

if you don't set the field to be tokenized, no analyzer will be used and 
the field's contents will be stored "as-is", i.e. case sensitive.
It's the analyzer's job to tokenize the input, so if you use an analyzer 
that does not separate the input into several tokens (like the 
KeywordAnalyzer), your input will remain "untokenized".

Regards
Christoph

Am 12.08.2014 um 03:38 schrieb Milind:
> I found the problem.  But it makes no sense to me.
>
> If I set the field type to be tokenized, it works.  But if I set it to not
> be tokenized the search fails.  i.e. I have to pass in true to the method.
>      theFieldType.setTokenized(storeTokenized);
>
> I want the field to be stored as un-tokenized.  But it seems that I don't
> need to do that.  The LowerCaseKeywordAnalyzer works if the field is
> tokenized, but not if it's un-tokenized!
>
> How can that be?
>
>
> On Mon, Aug 11, 2014 at 1:49 PM, Milind <milindr@gmail.com> wrote:
>
>> It does look like the lowercase is working.
>>
>> The following code
>>
>>          Document theDoc = theIndexReader.document(0);
>>          System.out.println(theDoc.get("sn"));
>>          IndexableField theField = theDoc.getField("sn");
>>          TokenStream theTokenStream = theField.tokenStream(theAnalyzer);
>>          System.out.println(theTokenStream);
>>
>> produces the following output
>>      SN345-B21
>>      LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62
>> 32 31],startOffset=0,endOffset=9
>>
>> But the search does not work.  Anything obvious popping out for anyone?
>>
>>
>> On Sat, Aug 9, 2014 at 4:39 PM, Milind <milindr@gmail.com> wrote:
>>
>>> I looked at a couple of examples on how to get keyword analyzer to be
>>> case insensitive but I think I missed something since it's not working for
>>> me.
>>>
>>> In the code below, I'm indexing text in upper case and searching in lower
>>> case.  But I get back no hits.  Do I need to something more while
>>> indexing?
>>>
>>>      private static class LowerCaseKeywordAnalyzer extends Analyzer
>>>      {
>>>          @Override
>>>          protected TokenStreamComponents createComponents(String
>>> theFieldName, Reader theReader)
>>>          {
>>>              KeywordTokenizer theTokenizer = new
>>> KeywordTokenizer(theReader);
>>>              TokenStreamComponents theTokenStreamComponents =
>>>                  new TokenStreamComponents(
>>>                          theTokenizer,
>>>                          new LowerCaseFilter(Version.LUCENE_46,
>>> theTokenizer));
>>>              return theTokenStreamComponents;
>>>          }
>>>      }
>>>
>>>      private static void addDocment(IndexWriter theWriter,
>>>                                        String theFieldName,
>>>                                        String theValue,
>>>                                        boolean storeTokenized)
>>>          throws Exception
>>>      {
>>>            Document theDocument = new Document();
>>>            FieldType theFieldType = new FieldType();
>>>            theFieldType.setStored(true);
>>>            theFieldType.setIndexed(true);
>>>            theFieldType.setTokenized(storeTokenized);
>>>            theDocument.add(new Field(theFieldName, theValue,
>>> theFieldType));
>>>            theWriter.addDocument(theDocument);
>>>      }
>>>
>>>
>>>      static void testLowerCaseKeywordAnalyzer()
>>>          throws Exception
>>>      {
>>>          Version theVersion = Version.LUCENE_46;
>>>          Directory theIndex = new RAMDirectory();
>>>
>>>          Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer();
>>>
>>>          IndexWriterConfig theConfig = new IndexWriterConfig(theVersion,
>>>                                                              theAnalyzer);
>>>          IndexWriter theWriter = new IndexWriter(theIndex, theConfig);
>>>          addDocment(theWriter, "sn", "SN345-B21", false);
>>>          addDocment(theWriter, "sn", "SN445-B21", false);
>>>          theWriter.close();
>>>
>>>          QueryParser theParser = new QueryParser(theVersion, "sn",
>>> theAnalyzer);
>>>          Query theQuery = theParser.parse("sn:sn345-b21");
>>>          IndexReader theIndexReader = DirectoryReader.open(theIndex);
>>>          IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
>>>          TopScoreDocCollector theCollector =
>>> TopScoreDocCollector.create(10, true);
>>>          theSearcher.search(theQuery, theCollector);
>>>          ScoreDoc[] theHits = theCollector.topDocs().scoreDocs;
>>>          System.out.println("Number of results found: " + theHits.length);
>>>      }
>>>
>>> --
>>> Regards
>>> Milind
>>>
>> --
>> Regards
>> Milind
>>
>


-- 
------------------------------------------------------------------------

Weil Individualität der beste Standard ist

Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstraße 1
80333 München

iconparc.de

Tel: +49 - 89- 15 90 06 - 21
Fax: +49 - 89- 15 90 06 - 19

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. 
HRB 121830, Amtsgericht München


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message