lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: multi-term synonym expansion
Date Tue, 06 Jul 2010 13:40:54 GMT
> My custom SKOSAnalyzer already performs synonym expansion
> based on the labels defined in a given SKOS model. But now I
> have the problem that real-world thesauri often define
> (multi terms) synonyms for mult-term words. Here is an
> example that defines the abbreviation "UN" as synonym for
> "United Nations"
> 
> <skos:Concept rdf:about="http://www.cs.univie.ac.at/thesaurus/concept/6">
>       <skos:prefLabel>United
> Nations</skos:prefLabel>
>      
> <skos:altLabel>UN</skos:altLabel>
>  </skos:Concept>
> 
> At the end the analyzer should add the term UN at the right
> position in the index. Taking the example above, a sentence
> "I work for the United Nations" should appear in the index
> as 
> 
> 2: [work: 2-> 6]
> 5: [united nations: 15->29] [un: 15->29]
> 
> ...so that a query "I work for the UN" also matches the
> document.
> 
> What is the best solution to implement that. With a
> TokenFilter I can work through the sentence token by token
> (using incrementToken()) and check if there is a synonym
> available. How can I analyze token sequences in a given
> text? Do I need to implement a custom tokenizer that
> recognizes entities based on a given dictionary?
> 
> I am grateful for any suggestions or advice.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory can
handle multi-word synonyms. This may help.


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message