lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dar...@ontrenet.com>
Subject Re: multi-term synonym expansion
Date Tue, 06 Jul 2010 14:37:10 GMT

How does the synonym filter work internally? I configured it with a very
large synonym file (90,000 lines) running Solr in glassfish and it started
fine, but when I queried, it hung and ran out of memory. The file wasn' big
enough to exhaust the heap....I never was able to get it to run smoothly.

On Tue, 6 Jul 2010 06:40:54 -0700 (PDT), Ahmet Arslan <iorixxx@yahoo.com>
wrote:
>> My custom SKOSAnalyzer already performs synonym expansion
>> based on the labels defined in a given SKOS model. But now I
>> have the problem that real-world thesauri often define
>> (multi terms) synonyms for mult-term words. Here is an
>> example that defines the abbreviation "UN" as synonym for
>> "United Nations"
>> 
>> <skos:Concept
rdf:about="http://www.cs.univie.ac.at/thesaurus/concept/6">
>>       <skos:prefLabel>United
>> Nations</skos:prefLabel>
>>      
>> <skos:altLabel>UN</skos:altLabel>
>>  </skos:Concept>
>> 
>> At the end the analyzer should add the term UN at the right
>> position in the index. Taking the example above, a sentence
>> "I work for the United Nations" should appear in the index
>> as 
>> 
>> 2: [work: 2-> 6]
>> 5: [united nations: 15->29] [un: 15->29]
>> 
>> ...so that a query "I work for the UN" also matches the
>> document.
>> 
>> What is the best solution to implement that. With a
>> TokenFilter I can work through the sentence token by token
>> (using incrementToken()) and check if there is a synonym
>> available. How can I analyze token sequences in a given
>> text? Do I need to implement a custom tokenizer that
>> recognizes entities based on a given dictionary?
>> 
>> I am grateful for any suggestions or advice.
> 
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> can handle multi-word synonyms. This may help.
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message