lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Auslaender <pauslan...@yahoo.com>
Subject Re: French synonyms & Online synonyms
Date Tue, 30 Sep 2008 18:35:01 GMT
True, synonyms can be grouped in cliques based on the strength of their 
"resemblence" given a specific context.

But what I'm indexing is the text content of TV programs produced by a 
public television, so the context is very large and non-specific. What I 
want is to find "automobile" for "car", "motorcycle" for "bike", "pub" 
for "restaurant", "woman" for "lady", and the likes.

There actually are free on-line resources for most European languages 
(of course, English included), check these out:
http://dico.isc.cnrs.fr/dico_html/en/index.html
http://www.crisco.unicaen.fr/alexandria2.html

Would you mind commenting on the following plan for a special synonym 
analyzer.
1/ We would start with an empty synonyms file.
2/ For each indexing request, the analyser looks up the file for 
synonyms. If it finds synonyms, it proceeds normally.
3/ Otherwise, it checks an online resource for synonyms, updates the 
synonyms file, and proceeds.

If you think this is workable, there are two problems left: which terms 
to look up for online synonyms, and how to select the "synonymity" clique.

For the first issue, I would definitely only search for synonyms of 
nouns, verbs and adjectives, so some stemming is required initially.
For the second issue, I'd have a cut-off value for the strength of 
"resemblence", if this information is available, or / and use the 
frequency of the synonyms in the SOLR index as a measure.

Building the synonyms file that way would make the system quicker over 
time, and for a specific domain (chemistry, biology, sports, etc) the 
process would be auto-adaptive - perhaps with some human help from time 
to time.

Thanks,
Pierre

Walter Underwood a écrit :
> Synonyms are domain-specific, so general-purpose lists are not very useful.
>
> Ultraseek shipped a British-American synonym list as an example, but even
> that wasn't very general. One of our customers was a chemical company and
> was very surprised when the search "rocket fuel" suggested "arugula",
> even though "rocket" is a perfectly good synonym for "arugula".
>
> wunder
>
> On 9/30/08 10:14 AM, "Otis Gospodnetic" <otis_gospodnetic@yahoo.com> wrote:
>
>   
>> Pierre,
>>
>> 1) I don't know, but a good place to check and see what previous answers to
>> this questions were is markmail.org
>> 2) I don't think there is such a thing, but I also don't think there are sites
>> that make this data freely available (answer to 1?)
>>
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>     
>>> From: Pierre Auslaender <pauslander@yahoo.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Tuesday, September 30, 2008 11:28:40 AM
>>> Subject: French synonyms & Online synonyms
>>>
>>> Hello,
>>>
>>> I'm sure these questions have been raised a million times, I'll try one
>>> more:
>>>
>>> 1/ Is there any general-purpose, free, French synonyms file out there?
>>>
>>> 2/ Is there a Solr or Lucene analyser class that could tap an on-line
>>> resource for synoynms at index-time? And by the same token, maintain and
>>> complete a synoynms text file?
>>>
>>> Thanks for the great work on SOLR and for the liveliness of this list.
>>>
>>> Pierre
>>>       
>
>
>   

Mime
View raw message