lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: No hits while searching!
Date Mon, 01 Jun 2009 17:51:00 GMT
Just build your own.

Here's exactly what you are looking for:

(Mind you I just whipped this out, and didn't compile it... so there 
could be minor syntax errors here.)

You will also obviously have to make your own package declaration, and 
your own imports.

So anyhow, the really neat thing about lucene, is being able to do 
exactly what we just did here, you can chain these tokenizers and 
filters together in almost any way you want, and create custom analyzers 
outta them.

Its a good thing to become familiar with, because I will nearly promise 
you that this analyzer here will ALSO probably be insufficient for your 
needs.

Anyhow, hope this helps.

Matt

/**
 * Custom Lowercase Analyzer
 *
 * @author mhall
 *
 * This analyzer tokenizes on whitespace, and then lowercases the token.
 *
 */

public class LowerCaseAnalyzer extends Analyzer {

    public LowerCaseAnalyzer() {
       super();
    }

    /**
     * Worker for this Analyzer.
     *
     * Specifically this analyzer chains together WhitespaceTokenizer ->
     * LowerCaseFilter together to form customized Tokens
     */

    public TokenStream tokenStream(String fieldName, Reader reader) {
        return new LowerCaseFilter(new WhitespaceTokenizer(reader));
    }

}

vanshi wrote:
> Thanks Matt & sithu. Yes, It was due to stop word analyzer...now i'm using a
> simple analyzer temporarily, as I know even simple analyzer cannot handle
> quotes in names. However, can somebody plz direct me towards how to handle
> quotes with the name in query using lowercase analyzer?
>
> thanks,
> Vanshi
>
> Matthew Hall-7 wrote:
>   
>> Yeah, he's gotta be.
>>
>> You might be better of using something like a lowercase analyzer here, 
>> since punctuation in a name is likely important.
>>
>> Matt
>>
>> Sudarsan, Sithu D. wrote:
>>     
>>>  
>>>
>>> Do you use stopword filtering?
>>>
>>> Sincerely,
>>> Sithu D Sudarsan
>>>
>>> -----Original Message-----
>>> From: vanshi [mailto:nilu.thakur@gmail.com] 
>>> Sent: Monday, June 01, 2009 11:39 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: No hits while searching!
>>>
>>>
>>> Thanks Erick, I was able to get this work...as you said ..Luke is a
>>> great
>>> tool to look in to what gets stored as indexes though in my case I was
>>> searching before the indexes were created so i was getting zero hits.
>>>
>>> On side note, I'm running a strange output with prefix query...it only
>>> works
>>> when i have 3 or more than 3 letters in the first name/last name. Any
>>> idea
>>> what is going on here? Please see the output from log here.
>>>
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running Prefix query,
>>> First name: ang
>>> 02:05:20,996 INFO  [PhysicianQueryBuilder] Before running  Prefix query,
>>> Last name: john
>>> 02:05:21,012 INFO  [LuceneIndexService] the query is:
>>> +(FIRST_NAME_EXACT:ang*) +(LAST_NAME_EXACT:john*)
>>> 02:05:21,012 INFO  [LuceneIndexService] Result Size: 1
>>>
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>>> First
>>> name: a
>>> 02:06:03,578 INFO  [PhysicianQueryBuilder] Before running term query,
>>> Last
>>> name: johns
>>> 02:06:03,578 INFO  [LuceneIndexService] the query is: +()
>>> +(LAST_NAME_EXACT:johns*)
>>> 02:06:03,578 INFO  [LuceneIndexService] Result Size: 0
>>>
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Entered addTypeSpecificTerms
>>> in
>>> PhysicianQuerybuilder with exactName=true
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>>> First
>>> name: an
>>> 02:08:01,548 INFO  [PhysicianQueryBuilder] Before running term query,
>>> Last
>>> name: johns
>>> 02:08:01,548 INFO  [LuceneIndexService] the query is: +()
>>> +(LAST_NAME_EXACT:johns*)
>>> 02:08:01,580 INFO  [LuceneIndexService] Result Size: 0
>>>
>>> As one can see the query works with first name=ang but not with first
>>> name=a
>>> or an.
>>>
>>> Appreciate all your inputs.
>>>
>>> Vanshi
>>>
>>> Erick Erickson wrote:
>>>   
>>>       
>>>> The most common issue with this kind of thing is that
>>>>     
>>>>         
>>> UN_TOKENIZEDimplies
>>>   
>>>       
>>>> no
>>>> case folding. So if your case differs you won't get a match.
>>>>
>>>> That aside, the very first thing I'd do is get a copy of Luke (google
>>>> Lucene
>>>> Luke)
>>>> and examine the index to see if what's in your index is what you
>>>>     
>>>>         
>>> *think*
>>>   
>>>       
>>>> is
>>>> in there.
>>>>
>>>>
>>>> The second thing I'd do is look at query.toString() to see what the
>>>>     
>>>>         
>>> actual
>>>   
>>>       
>>>>> query is. You can even paste the output of toString() into Luke and
>>>>>       
>>>>>           
>>> see
>>>   
>>>       
>>>>> what happens.
>>>>>       
>>>>>           
>>>> I'm not sure what buildMultiTermPrefixQuery is all about, but I assume
>>>> you have a good reason for using that. But the other strategy I use
>>>>     
>>>>         
>>> for
>>>   
>>>       
>>>> this kind of "what happened?" question is to peel back to simpler
>>>>     
>>>>         
>>> cases
>>>   
>>>       
>>>> until I get what I expect, then build back up until it breaks.....
>>>>
>>>> But really get a copy of Luke, it's a wonderful tool that'll give you
>>>>     
>>>>         
>>> lots
>>>   
>>>       
>>>> of
>>>> insight about what's *really* going on...
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, May 27, 2009 at 12:43 AM, vanshi <nilu.thakur@gmail.com>
>>>>     
>>>>         
>>> wrote:
>>>   
>>>       
>>>>> In my web application, I need search functionality on first name and
>>>>>       
>>>>>           
>>> last
>>>   
>>>       
>>>>> name in 2 different ways, one search must be based on 'Metaphone
>>>>> Analyzer'
>>>>> giving all similar sounding names as result and another search should
>>>>>       
>>>>>           
>>> be
>>>   
>>>       
>>>>> exact match on either first name or last name. The name sounds like
>>>>> search
>>>>> has already been coded previously and I need to add another exact
>>>>>       
>>>>>           
>>> match
>>>   
>>>       
>>>>> search to the application. For this, I have a Lucene Index based out
>>>>>       
>>>>>           
>>> on
>>>   
>>>       
>>>>> fields from database tables which already had the names field indexed
>>>>> with
>>>>> metaphone analyzer. I added 2 more fields in the existing document,
>>>>>       
>>>>>           
>>> which
>>>   
>>>       
>>>>> indexes first name/last name as UN_TOKENIZED. While searching for
>>>>>       
>>>>>           
>>> exact
>>>   
>>>       
>>>>> match, I create a term query to look in to newly created UN_TOKENIZED
>>>>> fields
>>>>> as shown in the code snippets......however this is not getting any
>>>>>       
>>>>>           
>>> hits.
>>>   
>>>       
>>>>> I
>>>>> would like to know is there anything wrong conceptually?
>>>>>
>>>>> //creating fields for the document
>>>>> FIRST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>>                FIRST_NAME_EXACT(Field.Store.NO,
>>>>> Field.Index.UN_TOKENIZED),
>>>>>                LAST_NAME(Field.Store.NO, Field.Index.TOKENIZED),
>>>>>                LAST_NAME_EXACT(Field.Store.NO,
>>>>>       
>>>>>           
>>> Field.Index.UN_TOKENIZED),
>>>   
>>>       
>>>>> //name sounds like analyzer class....used while Indexing and
>>>>>       
>>>>>           
>>> searching
>>>   
>>>       
>>>>> public class NameSoundsLikeAnalyzer extends Analyzer {
>>>>>        PerFieldAnalyzerWrapper wrapper;
>>>>>
>>>>>        /**
>>>>>         *
>>>>>         */
>>>>>        public NameSoundsLikeAnalyzer() {
>>>>>                wrapper = new PerFieldAnalyzerWrapper(new
>>>>>       
>>>>>           
>>> StopAnalyzer());
>>>   
>>>       
>>>>>                wrapper.addAnalyzer(
>>>>>
>>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.FIRST_NAME
>>>>>                                                .toString(), new
>>>>> MetaphoneReplacementAnalyzer());
>>>>>
>>>>>                wrapper.addAnalyzer(
>>>>>
>>>>>  PhysicianDocumentBuilder.PhysicianFieldInfo.LAST_NAME
>>>>>                                                .toString(), new
>>>>> MetaphoneReplacementAnalyzer());
>>>>>
>>>>>        }
>>>>>
>>>>>        /**
>>>>>         * @see PerFieldAnalyzerWrapper#tokenStream(String, Reader)
>>>>>         */
>>>>>        @Override
>>>>>        public TokenStream tokenStream(String fieldName, Reader
>>>>>       
>>>>>           
>>> reader) {
>>>   
>>>       
>>>>>                return wrapper.tokenStream(fieldName, reader);
>>>>>        }
>>>>>
>>>>> }
>>>>>
>>>>> //lastly the query builder
>>>>> if(physicianQuery.getExactNameSearch()){
>>>>>
>>>>>  if(StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith())){
>>>>>                                TermQuery term = new TermQuery(new
>>>>> Term(FIRST_NAME_EXACT.toString(),
>>>>> physicianQuery.getFirstNameStartsWith()));
>>>>>                                query.add(term,MUST);
>>>>>
>>>>>                        }
>>>>>
>>>>>  if(StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())){
>>>>>                                TermQuery term = new TermQuery(new
>>>>> Term(LAST_NAME_EXACT.toString(),
>>>>> physicianQuery.getLastNameStartsWith()));
>>>>>                                query.add(term,MUST);
>>>>>
>>>>>                        }
>>>>> else{
>>>>> //we want metaphone search
>>>>> if (StringUtils.isNotEmpty(physicianQuery.getFirstNameStartsWith()))
>>>>>       
>>>>>           
>>> {
>>>   
>>>       
>>>>>  query.add(buildMultiTermPrefixQuery(FIRST_NAME.toString(),
>>>>>
>>>>>  physicianQuery.getFirstNameStartsWith()), MUST);
>>>>>                        }
>>>>>
>>>>>                        if
>>>>> (StringUtils.isNotEmpty(physicianQuery.getLastNameStartsWith())) {
>>>>>
>>>>>  query.add(buildMultiTermPrefixQuery(LAST_NAME.toString(),
>>>>>
>>>>>  physicianQuery.getLastNameStartsWith()), MUST);
>>>>>                        }
>>>>> }
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>>       
>>>>>           
>>> http://www.nabble.com/No-hits-while-searching%21-tp23735920p23735920.htm
>>> l
>>>   
>>>       
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>> -- 
>> Matthew Hall
>> Software Engineer
>> Mouse Genome Informatics
>> mhall@informatics.jax.org
>> (207) 288-6012
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message