lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffrey De Smet <>
Subject Re: Automatic analyzer resolving based on Locale
Date Wed, 09 May 2007 15:13:25 GMT
We 'd use a different index for each locale's language that is 
configured, however this might have an impact on performance.

Would this be attainable (maybe some day in lucene)?

- Use an IndexEverythingAnalyzer for writing,
so "werk", "werkte", "gewerkt" and "en" is indexed as-is when they are 

- And then use a DutchAnalyzer for reading,
which if I ask "werk" searches for "werk", "werkte" and "gewerkt",
and also ignores stop words like "en" in the query.
EnglishAnalyzer would search with "werk" for "werk", "werkes", "werked", ...

- It might seem a bad idea to mix several languages in the same index,
but in reality few data comes with the meta-data which declares the 
language of the data is written in.

With kind regards,
Geoffrey De Smet

Chris Hostetter schreef:
> : There is nothing canned that I know of. I'm also not sure how this
> : would be used. If you're using a single index, how are you going
> : to index, then search using these analyzers? Or is there some
> : other magic going on?
> i suspect the use case is "shipped" software product, where you want to
> have one jar that works anywhere, but you want the analyzer used to depend
> on Locale of the JVM the software is installed in.
> Personally, i would advise against auto-selecting an Analyzer based on the
> runtime Locale ... it's a fine approach when dealing with purely transient
> data (ie: parsing Dates iput into a form) but it's a bad idea for
> persistant data (ie: formating dates to write them to a file) because the
> user could change their Locale and now the index they built the last time
> they ran your softare doesn't work anymore.
> just make it an option configurable at install time.
> -Hoss

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message