lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Synonyms and Aliases query
Date Tue, 27 Mar 2007 16:46:18 GMT
See below...

On 3/27/07, daveburns <> wrote:
> Hi,
> afriad I'm a noobie at Luncene but read Otis/Eriks book and was hoping
> someone can answer a quick question on the AliasAnalyzer (Chap 4). I want
> to
> build a search for names (Companies/surname, firstname etc) but need to
> match thing s like
> Robert= bob, bobby, rob etc (or margaret=peggy etc).
> 1) The problem is my search returns all Robert or Bob hit's if I search on
> bob but if I search on Robert only Robert hits come back. I though the
> index
> would automatically provide my required 2-way matching?

Lucene only gives you back what you put into the index. In this case,
did you index "robert" when the token was "bob"? In other words,
did you index all the variants of 'robert' every time you found any variant?

Let's say the three variants are 'bob' 'rob' and 'robert'. Whenever you
encounter any of those three variants, you need to add the *other*
two to your index as well as the original variant.

2) The second problem is to do with Company names. Can I create the above
> alias analyzer to match I.B.M=IBM=International Business Machines? i.e.
> multiple words to a single word.

Well, if you make your own query analyzer (subclassing the relevant one),
can always substitute the one-word version for the three word version. Be
careful about doing it the other way unless you make very sure it's a
That is, tokenizing "International Business Machines" would give you
three token and you'd be searching on the OR of them (assuming default
queryparser behavior). So you'd be better off having your analyzer turn
"international business machines" into IBM. You might be able to get
away with enclosing multiple-word entries in quotes...


> Dave.
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message