lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Auslaender <>
Subject Re: French synonyms & Online synonyms
Date Tue, 30 Sep 2008 18:35:01 GMT
True, synonyms can be grouped in cliques based on the strength of their 
"resemblence" given a specific context.

But what I'm indexing is the text content of TV programs produced by a 
public television, so the context is very large and non-specific. What I 
want is to find "automobile" for "car", "motorcycle" for "bike", "pub" 
for "restaurant", "woman" for "lady", and the likes.

There actually are free on-line resources for most European languages 
(of course, English included), check these out:

Would you mind commenting on the following plan for a special synonym 
1/ We would start with an empty synonyms file.
2/ For each indexing request, the analyser looks up the file for 
synonyms. If it finds synonyms, it proceeds normally.
3/ Otherwise, it checks an online resource for synonyms, updates the 
synonyms file, and proceeds.

If you think this is workable, there are two problems left: which terms 
to look up for online synonyms, and how to select the "synonymity" clique.

For the first issue, I would definitely only search for synonyms of 
nouns, verbs and adjectives, so some stemming is required initially.
For the second issue, I'd have a cut-off value for the strength of 
"resemblence", if this information is available, or / and use the 
frequency of the synonyms in the SOLR index as a measure.

Building the synonyms file that way would make the system quicker over 
time, and for a specific domain (chemistry, biology, sports, etc) the 
process would be auto-adaptive - perhaps with some human help from time 
to time.


Walter Underwood a écrit :
> Synonyms are domain-specific, so general-purpose lists are not very useful.
> Ultraseek shipped a British-American synonym list as an example, but even
> that wasn't very general. One of our customers was a chemical company and
> was very surprised when the search "rocket fuel" suggested "arugula",
> even though "rocket" is a perfectly good synonym for "arugula".
> wunder
> On 9/30/08 10:14 AM, "Otis Gospodnetic" <> wrote:
>> Pierre,
>> 1) I don't know, but a good place to check and see what previous answers to
>> this questions were is
>> 2) I don't think there is such a thing, but I also don't think there are sites
>> that make this data freely available (answer to 1?)
>>  Otis
>> --
>> Sematext -- -- Lucene - Solr - Nutch
>> ----- Original Message ----
>>> From: Pierre Auslaender <>
>>> To:
>>> Sent: Tuesday, September 30, 2008 11:28:40 AM
>>> Subject: French synonyms & Online synonyms
>>> Hello,
>>> I'm sure these questions have been raised a million times, I'll try one
>>> more:
>>> 1/ Is there any general-purpose, free, French synonyms file out there?
>>> 2/ Is there a Solr or Lucene analyser class that could tap an on-line
>>> resource for synoynms at index-time? And by the same token, maintain and
>>> complete a synoynms text file?
>>> Thanks for the great work on SOLR and for the liveliness of this list.
>>> Pierre

View raw message