lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: DIH, UTF8 and default DIH encoding value
Date Sun, 01 Aug 2010 07:00:24 GMT
Hi Amit,

Anyone can edit any Solr Wiki page - just create an account (I think the link to 
that is in the page footer) and edit.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Amit Nithian <anithian@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Sat, July 31, 2010 4:41:44 PM
> Subject: DIH, UTF8 and default DIH encoding value
> 
> All,
> 
> I am not sure if this is overly obvious or not (it wasn't to me) but  in
> trying to index some international characters from XML files using the  DIH,
> I found that setting the encoding attribute on the dataSource element  to
> "UTF-8" fixed my problem.
> 
> <dataSource type="FileDataSource"  encoding="UTF-8"/>
> 
> My question is why the default isn't UTF-8 or if  there is a good reason, can
> the DIH wiki be made more clear that this  encoding attribute can affect the
> indexing of international characters? If I  can get access to edit this wiki
> page, I can add a section to that effect..  perhaps under a troubleshooting
> section?
> 
> Thanks!
> Amit
> 

Mime
View raw message