lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Greene <tvxh-l...@spamex.com>
Subject Re: StandardFilter that works for French
Date Thu, 21 Nov 2002 21:26:55 GMT
Alng these same lines, has anyone developed a Spanish filter?  I have looked
but have not turned anything up.

Justin

> -----Original Message-----
> From: Joshua O'Madadhain [mailto:jmadden@ics.uci.edu]
> Sent: Thursday, November 21, 2002 4:00 PM
> To: Lucene Users List
> Cc: Joshua Rhys Taliesin O'Madadhain
> Subject: Re: StandardFilter that works for French
> 
> On Thu, 21 Nov 2002, Konrad Scherer wrote:
> 
> > In French you have 6 words (me, te, se, le/la , ne, de) 
> where the e is
> > replaced with an apostrophe when the following word starts 
> with a vowel.
> > For example me aider becomes m'aider. Currently Lucene 
> indexes m'aider,
> > s'aider, n'aider as different words when in fact they 
> should be analyzed as
> > me aider, se aider, ne aider, etc. So I modified Standard 
> filter to send
> > back these words as two words. I had to add a one Token 
> buffer. I toyed
> > with modifying StandardTokenizer.jj but I was worried about 
> unintended
> > changes in behavior.
> >
> > This change will not effect English indexing. The only 
> change I can think
> > of is that a word like m'lord would be indexed as "me 
> lord". Still it might
> > be better to make a French package and add this to a French Filter.
> 
> There are a number of contractions in English that could be 
> affected if
> you're using the apostrophe as a marker, e.g.: isn't, 
> wouldn't, I'd, he's,
> hasn't.  (Granted, these are often considered stop words.)  
> Thus, I think
> that your idea of incorporating this change into a French 
> filter, rather
> than modifying Standard filter, is a good idea.
> 
> Joshua O'Madadhain
> 
>   jmadden@ics.uci.edu....Obscurium Per 
> Obscurius....www.ics.uci.edu/~jmadden
>    Joshua O'Madadhain: Information Scientist, Musician, 
> Philosopher-At-Tall
> It's that moment of dawning comprehension that I live for.  
> -- Bill Watterson
>  My opinions are too rational and insightful to be those of 
> any organization.
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message