incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <d...@muse.net.nz>
Subject Re: CouchDB Invalid JSON UTF-8
Date Wed, 25 Apr 2012 14:23:32 GMT
On 25 April 2012 14:59, Robert Newson <rnewson@apache.org> wrote:
> It sounds like SQLToNoSQLImporter is not converting your data
> correctly. As it's Java, I would take a wild guess and assume the
> characters to bytes translation is being done with the platform
> default rather than "UTF-8". Since UTF-8 is the default encoding for
> JSON strings, that would be a pretty big oversight.
>
> B.
>
> On 25 April 2012 11:59, Paulo Carvalho <pjcarvalho@gmail.com> wrote:
>> Hello,
>>
>> I am trying SQLToNoSQLImporter to import data to a couchDB database
>> from a Postgresql database.
>>
>> I configured correctly the import.properties and db-data-config files.
>>
>> When I execute run.bat command (I am using windows), I get the
>> following result:
>>
>> 07:50:14,568  INFO DataImporter:134 - Data Configuration loaded
>> successfully
>> 07:50:18,477 ERROR DataImporter:178 - *****  Data import failed.
>> **********
>>  Reason is :
>> org.apache.http.HttpException: HTTP/1.1 400 Bad Request
>>        at
>> net.sathis.export.sql.couch.CouchWriter.post(CouchWriter.java:68)
>>        at
>> net.sathis.export.sql.couch.CouchWriter.writeToNoSQL(CouchWriter.java:
>> 52)
>>        at net.sathis.export.sql.DocBuilder.execute(DocBuilder.java:
>> 142)
>>        at
>> net.sathis.export.sql.DataImporter.doFullImport(DataImporter.java:174)
>>        at
>> net.sathis.export.sql.DataImporter.doDataImport(DataImporter.java:93)
>>        at
>> net.sathis.export.sql.SQLToNoSQLImporter.main(SQLToNoSQLImporter.java:
>> 19)
>>
>> As you can see, the configuration file is loaded correctly. In the
>> couchDB database log file, I get the following error:
>>
>> [debug] [<0.147.0>] Invalid JSON: {{error,
>>                                       {126,
>>                                        "lexical error: invalid
bytes
>> in UTF8 string.\n"}},
>>                                   <<"{\"docs\":[{\"_id\":\"0\",\"label
>> \":\"Pas de taches\"},{\"_id\":\"1\",\"description\":\"Le pourcentage
>> de recouvrement est < 2 %\",\"label\":\"Très peu nombreuses\"},{\"_id
>> \":\"2\",\"description\":\"Le p.......
>>
>> I think the problem happens because the text contained in the table
>> has special characters ("è", etc.).
>>
>> The postgresql database is coded in UTF-8.
>>
>>
>> Trying to solve the problem, I have written a little JSON file and i tried
>> to insert it on my database. My JSON file content was the following:
>> {"docs":[{"_id":"0","label ":"Pas de taches"}]}
>>
>> The result of inserting it on my database was: The result was:
>> {"ok":true,"id":"doc_id","rev":"1- ffaec7bc2aa548ca8e5a9c697ea3eb64"}
>>
>> Next, I changed just a little my JSON file: I've put a special character
>> (â):
>> {"docs":[{"_id":"0","label ":"Pas de tâches"}]}
>>
>> The result of inserting this JSON file on the database was:
>> {"error":"bad_request","reason":"invalid_json"}
>>
>>
>>
>> Anyone can help me with this issue?
>>
>> Thank you
>>
>> Best regards.

A quick suggestion, download an editor that explicitly supports
encodings like textpad or komodo, & create your JSON file in that, and
save as UTF8.

You'll find that works just fine. Sample files in
https://www.dropbox.com/sh/jeifcxpbtpo78ak/--8BGo8bb3/tmp/utf8wtf.zip
one created on mac & transferred, the other created in windows.

C:\tmp>curl -HContent-Type:application/json
http://localhost:5984/testy/utf8mac -XPUT -d@utf8mac.json
{"ok":true,"id":"utf8mac","rev":"1-b46df9f1f811323a133af7faf36d1a89"}

C:\tmp>curl -HContent-Type:application/json
http://localhost:5984/testy/utf8windows -XPUT -d@utf8windows.json
{"ok":true,"id":"utf8windows","rev":"1-b46df9f1f811323a133af7faf36d1a89"}

Without having tested it, something like

    recode latin1..UTF-8 *.json

would probably do the trick, I assume http://unxutils.sourceforge.net/
version is suitable.

A+
Dave

Mime
View raw message