lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jm <jmugur...@gmail.com>
Subject Re: analyzer not working properly when indexing
Date Wed, 21 Apr 2010 10:59:48 GMT
ok, got this. I upgraded my analyzer to new api but it was not correct...

thanks

On Wed, Apr 21, 2010 at 11:45 AM, Ian Lea <ian.lea@gmail.com> wrote:
> OK, so it does indeed look like a problem with your analyzer, as you suspected.
>
> You could confirm that by using e.g. WhitespaceAnalyzer instead.  Then
> maybe post the code for your custom analyzer, or step through in a
> debugger or however you prefer to debug code.
>
>
> --
> Ian.
>
>
> On Wed, Apr 21, 2010 at 8:20 AM, jm <jmuguruza@gmail.com> wrote:
>> I am using a TermQuery so no analyzer used...
>> protected static int getHitCount(Directory directory, String
>> fieldName, String searchString) throws IOException {
>>        IndexSearcher searcher = new IndexSearcher(directory, true); //5
>>        Term t = new Term(fieldName, searchString);
>>        Query query = new TermQuery(t); //6
>>        int hitCount = searcher.search(query, 1).totalHits;
>>        searcher.close();
>>        return hitCount;
>> }
>>
>> Yes, I have written the index to disk, and luke shows the words
>> without the numbers...
>>
>>
>> On Tue, Apr 20, 2010 at 7:09 PM, Ian Lea <ian.lea@gmail.com> wrote:
>>> Are you using the same analyzer for searching, in your unshown
>>> getHitCount() method?
>>>
>>> There is lots of good advice in the FAQ under "Why am I getting no
>>> hits / incorrect hits?".  And/or write the index to disk and use Luke
>>> to check that the correct content is being indexed.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, Apr 20, 2010 at 4:58 PM, jm <jmuguruza@gmail.com> wrote:
>>>> I am encountering a strange issue. I have a CustomStopAnalyzer. If I
>>>> do this (supporting code taken from AnalyzerUtils in LIA3 source code
>>>> Mike uploaded):
>>>>        Analyzer customStopAnalyzer = new CustomStopAnalyzer();
>>>>        AnalyzerUtils.displayTokensWithFullDetails(customStopAnalyzer,
>>>> "mail77");
>>>>
>>>> I get what I expect:
>>>> 1: [mail77:0->6:word]
>>>>
>>>> But when I am actually indexing docs, the word containing numbers
>>>> loose the numbers.
>>>>        directory = new RAMDirectory();
>>>>        writer = new IndexWriter(directory, customStopAnalyzer,
>>>> IndexWriter.MaxFieldLength.UNLIMITED);
>>>>        doc = new Document();
>>>>        doc.add((Fieldable) new Field("contents", "mail77",
>>>> Field.Store.NO, Field.Index.ANALYZED));
>>>>        writer.addDocument(doc);
>>>>        writer.close();
>>>>        hitCount = getHitCount(directory, "contents", "mail77");
>>>>        System.out.println("mail77 " + hitCount);
>>>>
>>>> This writes
>>>> mail77 0
>>>> If I look for "mail", I get one hit...I am using Lucene 3.0.1. Where
>>>> should I start looking (I assume in CustomStopAnalyzer but the fact
>>>> that displayTokensWithFullDetails() shows the right output puzzles
>>>> me)??
>>>>
>>>> thanks
>>>> javier
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message