lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Searching Diacritics
Date Mon, 27 Aug 2007 18:42:24 GMT

27 aug 2007 kl. 20.30 skrev anorman:

>
> I've tried to implement an analyzer with little different then using:
>
> result = new ISOLatin1AccentFilter(result);  in the TokenStream  
> method.
>
> Everything appears to work, however my search will not work for any  
> word
> with diacritics with that change.  Without using it will find words  
> such as
> "cèdulas" but not "cedulas", with the change it will find neither,  
> it just
> appears to be stripping it out altogether.

Do you use the same analyzer when searching as when creating the index?

-- 
karl


>
> Any suggestions?
>
>
>
>
> anorman wrote:
>>
>> Can I do this at search time rather than index time?  Below is my  
>> code
>> that is handling the searching, where would I utilize such a filter?
>>
>> Thanks for the help!
>>
>>
>>
>>
>> package search.lucene.search;
>> import org.apache.lucene.document.Document;
>> import java.io.IOException;
>> import java.util.ArrayList;
>> import java.util.List;
>>
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
>> import org.apache.lucene.queryParser.ParseException;
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.search.Hits;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.search.Query;
>>
>> import search.lucene.index.IndexManager;
>>
>> /**
>>  * This class is used to search the
>>  * Lucene index and return search results
>>  */
>>
>> public class SearchManager {
>>
>>
>> private String searchWord;
>>
>>     private IndexManager indexManager;
>>
>>     private Analyzer analyzer;
>>
>>     public SearchManager(String searchWord){
>>         this.searchWord   = searchWord;
>>         this.indexManager = new IndexManager();
>>         this.analyzer = new StandardAnalyzer();
>>     }
>>
>>     /**
>>      * do search
>>      */
>>     public List search(){
>>         List searchResult = new ArrayList();
>>             	
>>         IndexSearcher indexSearcher = null;
>>
>>         try{
>>             indexSearcher = new IndexSearcher 
>> (indexManager.getIndexDir());
>>         }catch(IOException ioe){
>>             ioe.printStackTrace();
>>         }
>>
>>         QueryParser queryParser = new QueryParser 
>> ("content",analyzer);
>>         Query query = null;
>>         try {
>>             query = queryParser.parse(searchWord);
>>         } catch (ParseException e) {
>>           e.printStackTrace();
>>         }
>>
>>         if(null != query && null != indexSearcher){			
>>             try {
>>                 Hits hits = indexSearcher.search(query);
>>                 for(int i = 0; i < hits.length(); i ++){
>> 					
>> 					Document doc = hits.doc(i);
>>       				System.out.println(doc.get("filename"));
>>
>> 					SearchResultBean resultBean = new SearchResultBean();
>>
>>
>> resultBean.setXMLId(hits.doc(i).get("id"));
>> 					resultBean.setXMLTitle(hits.doc(i).get("title"));
>> 					resultBean.setXMLAuthor(hits.doc(i).get("author"));
>> 					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
>> 					resultBean.setScore(hits.score(i));
>> 					
>> 					searchResult.add(resultBean);
>>                 }
>>             } catch (IOException e) {
>>                 e.printStackTrace();
>>             }
>>         }
>>         return searchResult;
>>
>>     }
>>
>>
>>
>>
>>
>>
>> thomas arni-2 wrote:
>>>
>>> You can extend the DefaultAnalyzer.
>>> The only thing you have to do, is to rewrite the method  
>>> tokenStream like
>>> this:
>>>
>>>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>>>   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
>>> StopFilter}. */
>>>   public TokenStream tokenStream(String fieldName, Reader reader) {
>>>     TokenStream result = new StandardTokenizer(reader);
>>>     result = new StandardFilter(result);
>>>     result = new LowerCaseFilter(result);
>>>     result = new StopFilter(result, stopSet);
>>>     result = new ISOLatin1AccentFilter(result);
>>>     return result;
>>>   }
>>>
>>>
>>> anorman wrote:
>>>> This looks like exactly what I want.  Would I implement this  
>>>> along with
>>>> another analyzer such as the standard or stand alone?  Does  
>>>> anyone have
>>>> any
>>>> code examples of implementing such a thing?
>>>>
>>>> Thanks,
>>>> Albert
>>>>
>>>>
>>>>
>>>>
>>>> karl wettin-3 wrote:
>>>>
>>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>>
>>>>>
>>>>>> I have a searchable index of documents which contain french and
>>>>>> spanish
>>>>>> diacritics (è, é, À) etc.  I would like to make the content
>>>>>> searchable so
>>>>>> that when a user searches for a word such as "Amèrique" or  
>>>>>> "Amerique"
>>>>>> (without diacritic) then it returns the same results.
>>>>>>
>>>>>> Has anyone set up something similar?
>>>>>>
>>>>> ISOLatin1AccentFilter
>>>>>
>>>>> -- 
>>>>> karl
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Searching- 
> Diacritics-tf4335454.html#a12354962
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message