lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarendra Pratap <samarz...@gmail.com>
Subject Re: about analyzer for searching location
Date Mon, 19 Apr 2010 10:31:34 GMT
Well... you are 50% right.

when you write
*
*
* Query q = qp.parse("\"united states\"");*

It does search for two separate tokens "united" and "states" but checks if
those are written sequentially. So above search will search for documents
where token "states" is written after "united".

*Note* that since it checks tokens sequentially it may also find documents
where some non-tokenizable characters or stop words exist between "united"
and "states", e.g. - *united and states *(here "and" is a stop word).

TermQuery will work it the way you said in your reply, i.e. will search for
a token "united states" which is not what you want.



On Mon, Apr 19, 2010 at 3:33 PM, Ian.huang <yiwong2001@hotmail.com> wrote:

> Does a token of "united states" exist in index if using standard analyzer.
> My understanding is, united and states are separately stored in index, but
> not as "united states". So, if I build a query like Query q =
> qp.parse("\"united states\""); It would not return any result. Am I right?
>
> Ian
>
> --------------------------------------------------
> From: "Samarendra Pratap" <samarzone@gmail.com>
> Sent: Friday, April 16, 2010 9:02 PM
> To: <java-user@lucene.apache.org>
> Subject: Re: about analyzer for searching location
>
>  Hi. I don't think you need a different analyzer. Read about
>> PhraseQuery<
>> http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/PhraseQuery.html
>> >.
>>
>> If you are using parse() method of QueryParser. Enclose the searched
>> string
>> in extra double quotes, which must obviously be escaped.
>>
>> Query q = qp.parse("\"united states\"");
>>
>>
>> 2010/4/15 Ian.huang <yiwong2001@hotmail.com>
>>
>>  Hi All,
>>>
>>> I am implementing a search function for address by hibernate search which
>>> is based on lucene. The class definition as following:
>>>
>>> @Indexed
>>> public class Address implements Cloneable
>>> {
>>> @DocumentId
>>> private int id;
>>> @Field
>>> private String addrCountry;
>>> private String addrDesc;
>>> @Field
>>> private String addrLineOne;
>>> private String addrLineTwo;
>>> @Field
>>> private String addrCity;
>>> ......
>>>
>>> As you see, addrCountry, addrLineone and addrCity are fields for search.
>>> I
>>> am using default analyzer in index & search. So I think country name like
>>> United States would be indexed as two terms United, and states.
>>>
>>> In addition, during search, a search keyword like United states, or Salt
>>> lake city would be tokenized as two or three single words.
>>>
>>> As result, any address fields contain united, city would be returned.
>>> like
>>> United Kingdom, but actually I want to get a result of united states.
>>>
>>> My expected result as following:
>>>
>>> if someone searches for "united" it should return "united states" and
>>> "united kingdom".
>>>
>>> if someone searches for "united states" it should return "united states",
>>> and not "united kingdom".
>>>
>>> I hope the analyzer can generate term with multiple words. say, united
>>> states to united states. I think standardanalyzer would analyze united
>>> states to united and states?
>>>
>>> A different example: if search keyword is parking lot in Salt Lake City,
>>> the generated terms to search need to be: parking lot and Salt Lake City,
>>> not parking,lot,salt,lake and city.
>>>
>>> I wonder if any analyzer can help me to implement my requirement. It
>>> would
>>> be better to use dictionary based solution, then I can manage some search
>>> terms that could have multiple words.
>>>
>>> thanks
>>>
>>> Ian
>>>
>>
>>
>>
>>
>> --
>> Regards,
>> Samar
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Samar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message