lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: English and French documents together / analysis, indexing, searching
Date Thu, 20 Jan 2005 18:09:50 GMT
Right now I am using StandardAnalyzer but the results are not what I'd 
hope for. Also since my understanding is that we should use the same 
analyzer for searching that was used for indexing,
even if I can manage to guess the language during indexing and apply to 
the SnowBall analyzer I wouldn't be able to use SnowBall for searching 
because users want to search through both
English and French and I suppose I would not get the same results if 
used with StandardAnalyzer?

Another problem with StandardAnalyzer is that it breaks up some words 
that should not be broken (in our case document identifiers such as 
ABC-1234 etc) but that's a secondary issue...



Bernhard Messer said the following on 1/20/2005 1:05 PM:

> i think the easiest way ist to use Lucene's StandardAnalyzer. If you 
> want to use the snowball stemmers, you have to add a language guesser 
> to get the language for the particular document before creating the 
> analyzer.
> regards
> Bernhard
> schrieb:
>> Greetings everyone
>> I wonder is there a solution for analyzing both English and French 
>> documents using the same analyzer.
>> Reason being is that we have predominantly English documents but 
>> there are some French, yet it all has to go into the same index
>> and be searchable from the same location during any perticular 
>> search. Is there a way to analyze both types of documents with
>> a same analyzer (and which one)?
>> I've looked around and I see there's a SnowBall analyzer but you have 
>> to specify the language of analysis, and I do not know that
>> ahead of time during indexing nor do I know it most of the time 
>> during searching (users would like to search in both document types).
>> There's also the issue of letter accents in french words and 
>> searching for the same (how are they indexed at the first place even)?
>> Has anyone dealt with this before and how did you solve the problem?
>> thanks
>> -pedja
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message