lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <bdelacre...@apache.org>
Subject Indexing of non-english text with Solr, any known limitations?
Date Wed, 12 Apr 2006 08:46:21 GMT
Hi Solr users,

I'm investigating indexers for a project, played a bit with both Solr  
and Nutch recently, and the Solr "RESTful indexing component" concept  
fits our needs quite well.

Before I dig too deep, are there any known limitations w.r.t indexing  
of non-english text?

I know Lucene fully supports multi-language indexing, and I've seen  
the cool language identifiers and analysis factories in Nutch, but  
there's little information about multi-language indexing in Solr -  
hence my question.

The project that I'm looking at is currently single-language  
(French), which I assume can be handled by static configuration of  
the appropriate analyzers.

But we might have to make sure we can handle multiple languages  
cleanly in a single index before making a final decision on which  
indexer to use, as here in Switzerland we very often have to handle  
multiple languages.

Thanks for any insights on this subject!

-Bertrand

(brief introduction: I'm a committer on the Cocoon project,  
independent consultant, helping teams build webapps using Cocoon and  
other mostly Java-based technologies, more info at http:// 
www.codeconsult.ch)
Mime
View raw message