lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Outar" <rou...@ideorlando.org>
Subject RE: Searching for hyphenated terms
Date Thu, 13 Mar 2003 16:32:13 GMT
I had similar problems that were solved with this Analyzer:

 
    public TokenStream tokenStream(String field, final Reader reader) {

        // do not tokenize any field
        TokenStream t = new CharTokenizer(reader) {
            protected boolean isTokenChar(char c) {
                return true;
            }
        };

        //case insensitive search
        t = new LowerCaseFilter(t);
        return t;

    }

Thanks,
 
Rob 


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Thursday, March 13, 2003 11:22 AM
To: Lucene Users List
Subject: Re: Searching for hyphenated terms


Make a custom Analyzer.  They are super simple to write.
Take pieces of WhitespaceAnalyzer and the Standard one.

Otis

--- "Sieretzki, Dionne R, SOLGV" <dhutton@att.com> wrote:
> I have seen some previous postings about "Escape woes" and "Hyphens
> not matching", but I haven't seen any resolutions to an issue I've
> been trying to work out.  
> 
> I don't want my search field to be case sensitive, so I used
> StandardAnalyzer.  The search field also has corresponding entries
> that may or may not contain hyphens or other special characters. If
> the field is not tokenized, very few search terms result in matches. 
> It appears that terms are only matched if a wildcard is used, such
> as:
> 
> Entered: ADOG  / Actual Query is: adog / No match on an exact term
> Entered: ADOG* / Actual Query is: ADOG* /  Match found
> Entered: AAA-ADOG / Actual Query is: aaa -adog / No match 
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / No match 
> Entered: AAA?ADOG /  Actual Query is: aaa?adog / Match found
> Entered: DOG.2  / Actual Query is: dog.2 / No match 
> Entered: DOG?2 / Actual Query is: DOG?2 /  Match found
> 
> 
> If the field is tokenized, then even more mixed results are produced.
> 
> Entered: ADOG / Actual Query is: adog / Match found for exact term
> Entered: ADOG* / Acutal Query is: ADOG* / No match
> Entered: AAA-ADOG / Actual Query is: aaa -adog / Match found
> Entered: "AAA-ADOG" / Actual Query is: "aaa adog" / Match found
> Entered: DOG.2 / Actual Query is: adog.2  / Match found
> Entered: AAA-DOG-BBB / Actual Query is: aaa -dog -bbb / No match
> Entered: " AAA-DOG-BBB" / Actual Query is: "aaa dog bbb" / No match
> Entered: ADOG-I40 / Actual Query is: adog -i40 / Incorrect matches
> Entered: "ADOG-I40" / Actual Query is: adog-i40 / Match found for
> exact term
> 
> 
> Can anyone recommend the right Analyzer to use that isn't case
> sensitive and matches on both hyphenated and non-hyphenated terms?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message