lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Searching using dash in words/abbreviations
Date Fri, 12 Dec 2008 19:52:19 GMT
Jenny - yes, it is possible to tune Lucene's analysis process very  
precisely.  Look at the Analyzer you're using, and consider  
customizing it to your needs.  An analyzer has a tokenizer followed by  
token filters - there are a lot of reusable components built into  
Lucene's API to choose from and configure.

	Erik

On Dec 12, 2008, at 2:46 PM, Jenny Brown wrote:

> Is it possible to configure Lucene such that it doesn't tokenize on
> embedded dashes, and thus doesn't consider the "A" a stop word because
> it's not standing alone?  I do believe the combination of dash
> handling and stop words is why the query is causing problems for my
> user.
>
>
> On Fri, Dec 12, 2008 at 1:32 PM, Daniel Naber
> <lucenelist2007@danielnaber.de> wrote:
>> On Freitag, 12. Dezember 2008, Jenny Brown wrote:
>>
>>> I'm trying to search for company ABC Inc. in places where it may be
>>> mentioned as A-B-C Inc.  Lucene is doing something with those  
>>> dashes,
>>> though, that prevents me from getting accurate results.
>>
>> "A" (even in "A-B-C" I think) is a stopword with StandardAnalyzer's  
>> default
>> settings, which might cause problems. Please also check out these  
>> hints
>> from the FAQ:
>>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
>>
>> Regards
>> Daniel
>>
>> --
>> http://www.danielnaber.de
>>


Mime
View raw message