lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anorman <anor...@mun.ca>
Subject Re: Searching Diacritics
Date Mon, 27 Aug 2007 16:47:39 GMT

Can I do this at search time rather than index time?  Below is my code that
is handling the searching, where would I utilize such a filter?

Thanks for the help!




package search.lucene.search;
import org.apache.lucene.document.Document;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

import search.lucene.index.IndexManager;

/**
 * This class is used to search the 
 * Lucene index and return search results
 */

public class SearchManager {


private String searchWord;
    
    private IndexManager indexManager;
    
    private Analyzer analyzer;
    
    public SearchManager(String searchWord){
        this.searchWord   = searchWord;
        this.indexManager = new IndexManager();
        this.analyzer = new StandardAnalyzer();
    }
    
    /**
     * do search
     */
    public List search(){
        List searchResult = new ArrayList();
            	
        IndexSearcher indexSearcher = null;

        try{
            indexSearcher = new IndexSearcher(indexManager.getIndexDir());
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

        QueryParser queryParser = new QueryParser("content",analyzer);
        Query query = null;
        try {
            query = queryParser.parse(searchWord);
        } catch (ParseException e) {
          e.printStackTrace();
        }
        
        if(null != query && null != indexSearcher){			
            try {
                Hits hits = indexSearcher.search(query);
                for(int i = 0; i < hits.length(); i ++){
					
					Document doc = hits.doc(i);
      				System.out.println(doc.get("filename"));
                    
					SearchResultBean resultBean = new SearchResultBean();

                                       
resultBean.setXMLId(hits.doc(i).get("id"));
					resultBean.setXMLTitle(hits.doc(i).get("title"));
					resultBean.setXMLAuthor(hits.doc(i).get("author"));
					resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
					resultBean.setScore(hits.score(i));
					
					searchResult.add(resultBean);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return searchResult;   
        
    }






thomas arni-2 wrote:
> 
> You can extend the DefaultAnalyzer.
> The only thing you have to do, is to rewrite the method tokenStream like 
> this:
> 
>   /** Constructs a {@link StandardTokenizer} filtered by a {@link
>   StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>     TokenStream result = new StandardTokenizer(reader);
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, stopSet);
>     result = new ISOLatin1AccentFilter(result);
>     return result;
>   }
> 
> 
> anorman wrote:
>> This looks like exactly what I want.  Would I implement this along with
>> another analyzer such as the standard or stand alone?  Does anyone have
>> any
>> code examples of implementing such a thing?
>>
>> Thanks,
>> Albert
>>
>>
>>
>>
>> karl wettin-3 wrote:
>>   
>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>
>>>     
>>>> I have a searchable index of documents which contain french and  
>>>> spanish
>>>> diacritics (è, é, À) etc.  I would like to make the content  
>>>> searchable so
>>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>>> (without diacritic) then it returns the same results.
>>>>
>>>> Has anyone set up something similar?
>>>>       
>>> ISOLatin1AccentFilter
>>>
>>> -- 
>>> karl
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12353022
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message