lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.net>
Subject Re: hypens
Date Mon, 17 Apr 2006 22:51:17 GMT

17 apr 2006 kl. 18.59 skrev John Powers:

> Hello,
>
> If I have a user search for "b-trunk"  I would like them to be able to
> find "b-trunk" (with hypen).   I would also like someone searching for
> "b trunk" to also find "b-trunk".

If you don't care about spans, make a filter that rebuilds the token  
at index time. It's a bit quick and dirty, but I do things like this  
in some cases without any major problems.

Below code builds [btrunk] [b-trunk] [b] [trunk].

I would not recommend you to do this without considering what you do.



public class LowASCIIDashWordFilter extends TokenFilter {

     private static Pattern p = Pattern.compile("(\\w+)-(\\w+)");

     /** Construct a token stream filtering the given input. */
     public LowASCIIDashWordFilter(TokenStream input) {
         super(input);
     }

     private LinkedList<Token> buf;

     /** Returns the next token in the stream, or null at EOS. */
     public Token next() throws IOException {
         Token next;
         if (buf == null) {
             buf = new LinkedList<Token>();
             next = input.next();
             while (next != null) {
                 Matcher m = p.matcher(next.termText());
                 if (m.matches()) {
                     buf.add(new Token(m.group(1) + m.group(2),  
m.start(1), m.end(2), "composite dashword"));
                     buf.add(new Token(m.group(1), m.start(1), m.end 
(1), "left dashword"));
                     buf.add(new Token(m.group(2), m.start(2), m.end 
(2), "right dashword"));
                 }
                 next = input.next();
             }
         }

         if (buf.size() > 0) {
             return buf.removeFirst();
         } else {
             return null;
         }
     }

}



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message