lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Croxton <>
Subject Indexing odt files
Date Thu, 28 Apr 2011 01:34:18 GMT
Requesting help for someone way outside of his comfort zone. :)

I'm trying to use solr to index several hundred OpenDocument files.  I 
downloaded and installed the example site and got it to work on the same files.  
I modified to change the mime type to vnd.oasis.opendocument.text (and I 
also tried x-vnd.oasis.opendocument.text).  When I try to post it, I get the 
error, "Error 400 Invalid UTF-8 start byte 0xba (at char #12, byte #-1)".

The best I can tell is that it is trying to parse it as an xml document, not an 
odt document, because when I do a hexdump of the odt file, I do see a character 
0xba at approximately the right position, but it isn't there in the extracted 

I may be overlooking some configuration setting, or who knows what.  I 
understand the solr set up very poorly.  If anyone can help me, I would be 


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message