lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: get original term for synonym
Date Wed, 14 Nov 2007 13:10:22 GMT
It would be useful to have more details about the query input and the expected highlights you

So given your 'zone-indeling' example document and the index-time tokenisation you described,
which of the following queries would you expect to match and what would you want highlighted
in each case?
1) zone
2) zone-indeling
3) "zone indeling"
4) zone-somethingElse

My assumption here is that you are using the standard Lucene Query parser and that query 3
will therefore be a phrase query. 


----- Original Message ----
From: Matthijs Bierman <>
Sent: Wednesday, 14 November, 2007 11:51:07 AM
Subject: Re: get original term for synonym

Hi Mark,

Your solution would be correct if the synonym would be a true 2-way
synonym. Unfortunately this is not the case. My analyzer takes care of
decomposition of specific Dutch words (where a "-" is used to create
compound words). For example: 'zone-indeling' would create synonyms for
'zone'-> 'zone-indeling' and 'indeling'->'zone-indeling'.
When analyzing 'zone' it will therefore not point back to
'zone-indeling' (this information is simply not available). Putting all
the results from the indexing process into a file or lucene document
(thus creating a 'lookup' index) would probably make the lookup process
rather slow, or make application startup too long (due to HashMap

Maybe you can do something with offsets?


To unsubscribe, e-mail:
For additional commands, e-mail:

Yahoo! Answers - Got a question? Someone out there knows the answer. Try it

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message