lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Can SOLR Index UTF-16 Text
Date Sat, 29 Sep 2012 00:17:15 GMT

: Our SOLR setup  (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8
: files. Recently, however, we noticed that it has issues with indexing
: certain text files eg. UTF-16 files.  See attachment for an example
: (tarred+zipped)
: 
: tesla-utf16.txt
: <http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt>  

No attachment came through to the list, and the URL nabble seems to have 
provided when you posted your message leads to a 404.

IN general, the question of "is indexing a UTF-16 file supported" largely 
depneds on *how* you are indexing this file -- if it's plain text, are you 
parsing it yourself using some client code, and then sending it to solr, 
are you using DIH to read it from disk? are you using 
ExtractingRequestHandler?

those are all very differnet ways to index data in Solr -- and depending 
on what you are doing determins how/where the encoding of that file is 
processed.


-Hoss

Mime
View raw message