lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anorman <anor...@mun.ca>
Subject Re: Searching Diacritics
Date Tue, 28 Aug 2007 11:54:26 GMT

This was the problem, it worked excellent!

Thanks for the help!

-Albert




karl wettin-3 wrote:
> 
> 
> 27 aug 2007 kl. 20.30 skrev anorman:
> 
>>
>> I've tried to implement an analyzer with little different then using:
>>
>> result = new ISOLatin1AccentFilter(result);  in the TokenStream  
>> method.
>>
>> Everything appears to work, however my search will not work for any  
>> word
>> with diacritics with that change.  Without using it will find words  
>> such as
>> "cèdulas" but not "cedulas", with the change it will find neither,  
>> it just
>> appears to be stripping it out altogether.
> 
> Do you use the same analyzer when searching as when creating the index?
> 
> -- 
> karl
> 
> 
>>
>> Any suggestions?
>>
>>
>>
>>
>> anorman wrote:
>>>
>>> Can I do this at search time rather than index time?  Below is my  
>>> code
>>> that is handling the searching, where would I utilize such a filter?
>>>
>>> Thanks for the help!
>>>
>>>
>>>
>>>
>>> package search.lucene.search;
>>> import org.apache.lucene.document.Document;
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.List;
>>>
>>> import org.apache.lucene.analysis.Analyzer;
>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>> import org.apache.lucene.analysis.ISOLatin1AccentFilter;
>>> import org.apache.lucene.queryParser.ParseException;
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.search.Hits;
>>> import org.apache.lucene.search.IndexSearcher;
>>> import org.apache.lucene.search.Query;
>>>
>>> import search.lucene.index.IndexManager;
>>>
>>> /**
>>>  * This class is used to search the
>>>  * Lucene index and return search results
>>>  */
>>>
>>> public class SearchManager {
>>>
>>>
>>> private String searchWord;
>>>
>>>     private IndexManager indexManager;
>>>
>>>     private Analyzer analyzer;
>>>
>>>     public SearchManager(String searchWord){
>>>         this.searchWord   = searchWord;
>>>         this.indexManager = new IndexManager();
>>>         this.analyzer = new StandardAnalyzer();
>>>     }
>>>
>>>     /**
>>>      * do search
>>>      */
>>>     public List search(){
>>>         List searchResult = new ArrayList();
>>>             	
>>>         IndexSearcher indexSearcher = null;
>>>
>>>         try{
>>>             indexSearcher = new IndexSearcher 
>>> (indexManager.getIndexDir());
>>>         }catch(IOException ioe){
>>>             ioe.printStackTrace();
>>>         }
>>>
>>>         QueryParser queryParser = new QueryParser 
>>> ("content",analyzer);
>>>         Query query = null;
>>>         try {
>>>             query = queryParser.parse(searchWord);
>>>         } catch (ParseException e) {
>>>           e.printStackTrace();
>>>         }
>>>
>>>         if(null != query && null != indexSearcher){			
>>>             try {
>>>                 Hits hits = indexSearcher.search(query);
>>>                 for(int i = 0; i < hits.length(); i ++){
>>> 					
>>> 					Document doc = hits.doc(i);
>>>       				System.out.println(doc.get("filename"));
>>>
>>> 					SearchResultBean resultBean = new SearchResultBean();
>>>
>>>
>>> resultBean.setXMLId(hits.doc(i).get("id"));
>>> 					resultBean.setXMLTitle(hits.doc(i).get("title"));
>>> 					resultBean.setXMLAuthor(hits.doc(i).get("author"));
>>> 					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
>>> 					resultBean.setScore(hits.score(i));
>>> 					
>>> 					searchResult.add(resultBean);
>>>                 }
>>>             } catch (IOException e) {
>>>                 e.printStackTrace();
>>>             }
>>>         }
>>>         return searchResult;
>>>
>>>     }
>>>
>>>
>>>
>>>
>>>
>>>
>>> thomas arni-2 wrote:
>>>>
>>>> You can extend the DefaultAnalyzer.
>>>> The only thing you have to do, is to rewrite the method  
>>>> tokenStream like
>>>> this:
>>>>
>>>>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>>>>   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
>>>> StopFilter}. */
>>>>   public TokenStream tokenStream(String fieldName, Reader reader) {
>>>>     TokenStream result = new StandardTokenizer(reader);
>>>>     result = new StandardFilter(result);
>>>>     result = new LowerCaseFilter(result);
>>>>     result = new StopFilter(result, stopSet);
>>>>     result = new ISOLatin1AccentFilter(result);
>>>>     return result;
>>>>   }
>>>>
>>>>
>>>> anorman wrote:
>>>>> This looks like exactly what I want.  Would I implement this  
>>>>> along with
>>>>> another analyzer such as the standard or stand alone?  Does  
>>>>> anyone have
>>>>> any
>>>>> code examples of implementing such a thing?
>>>>>
>>>>> Thanks,
>>>>> Albert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> karl wettin-3 wrote:
>>>>>
>>>>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>>>>
>>>>>>
>>>>>>> I have a searchable index of documents which contain french and
>>>>>>> spanish
>>>>>>> diacritics (è, é, À) etc.  I would like to make the content
>>>>>>> searchable so
>>>>>>> that when a user searches for a word such as "Amèrique" or 

>>>>>>> "Amerique"
>>>>>>> (without diacritic) then it returns the same results.
>>>>>>>
>>>>>>> Has anyone set up something similar?
>>>>>>>
>>>>>> ISOLatin1AccentFilter
>>>>>>
>>>>>> -- 
>>>>>> karl
>>>>>> ------------------------------------------------------------------

>>>>>> ---
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Searching- 
>> Diacritics-tf4335454.html#a12354962
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12366572
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message