couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Carvalho <pjcarva...@gmail.com>
Subject Re: CouchDB Invalid JSON UTF-8
Date Wed, 25 Apr 2012 14:52:27 GMT
Thanks for your suggestion.

I found another way to solve the problem: I get the system encoding
file, I change it to utf-8, the process is made and after I put back
the encoding as it was:

// These lines were added to convert from ISO-8859-1 to UTF-8 because
of the error
        //      (... lexical error: invalid bytes in UTF8 string ...) that
was
happening on the
        //      couchDB database server
        final String currentEnc =
System.getProperties().getProperty("file.encoding");
        System.getProperties().setProperty("file.encoding", "utf-8");
        final byte[] b = content.getBytes("utf-8");
        final String s = new String(b);
        httpclient = new DefaultHttpClient();
        HttpEntity entity = new StringEntity(s,
ContentType.APPLICATION_JSON);

        // Original code line
        //HttpEntity entity = new StringEntity(content,
ContentType.APPLICATION_JSON);
        post.setEntity(entity);
        post.setHeader(new BasicHeader("Content-Type", "application/
json"));
        HttpResponse response = httpclient.execute(post);
        if (response.getStatusLine().getStatusCode() !=
HttpStatus.SC_CREATED)
          throw new
HttpException(response.getStatusLine().toString());

        // To put the file encoding as it was
        System.getProperties().setProperty("file.encoding",
currentEnc);

Regards

On Wed, Apr 25, 2012 at 4:23 PM, Dave Cottlehuber <dave@muse.net.nz> wrote:

> On 25 April 2012 14:59, Robert Newson <rnewson@apache.org> wrote:
> > It sounds like SQLToNoSQLImporter is not converting your data
> > correctly. As it's Java, I would take a wild guess and assume the
> > characters to bytes translation is being done with the platform
> > default rather than "UTF-8". Since UTF-8 is the default encoding for
> > JSON strings, that would be a pretty big oversight.
> >
> > B.
> >
> > On 25 April 2012 11:59, Paulo Carvalho <pjcarvalho@gmail.com> wrote:
> >> Hello,
> >>
> >> I am trying SQLToNoSQLImporter to import data to a couchDB database
> >> from a Postgresql database.
> >>
> >> I configured correctly the import.properties and db-data-config files.
> >>
> >> When I execute run.bat command (I am using windows), I get the
> >> following result:
> >>
> >> 07:50:14,568  INFO DataImporter:134 - Data Configuration loaded
> >> successfully
> >> 07:50:18,477 ERROR DataImporter:178 - *****  Data import failed.
> >> **********
> >>  Reason is :
> >> org.apache.http.HttpException: HTTP/1.1 400 Bad Request
> >>        at
> >> net.sathis.export.sql.couch.CouchWriter.post(CouchWriter.java:68)
> >>        at
> >> net.sathis.export.sql.couch.CouchWriter.writeToNoSQL(CouchWriter.java:
> >> 52)
> >>        at net.sathis.export.sql.DocBuilder.execute(DocBuilder.java:
> >> 142)
> >>        at
> >> net.sathis.export.sql.DataImporter.doFullImport(DataImporter.java:174)
> >>        at
> >> net.sathis.export.sql.DataImporter.doDataImport(DataImporter.java:93)
> >>        at
> >> net.sathis.export.sql.SQLToNoSQLImporter.main(SQLToNoSQLImporter.java:
> >> 19)
> >>
> >> As you can see, the configuration file is loaded correctly. In the
> >> couchDB database log file, I get the following error:
> >>
> >> [debug] [<0.147.0>] Invalid JSON: {{error,
> >>                                       {126,
> >>                                        "lexical error: invalid bytes
> >> in UTF8 string.\n"}},
> >>                                   <<"{\"docs\":[{\"_id\":\"0\",\"label
> >> \":\"Pas de taches\"},{\"_id\":\"1\",\"description\":\"Le pourcentage
> >> de recouvrement est < 2 %\",\"label\":\"Très peu nombreuses\"},{\"_id
> >> \":\"2\",\"description\":\"Le p.......
> >>
> >> I think the problem happens because the text contained in the table
> >> has special characters ("è", etc.).
> >>
> >> The postgresql database is coded in UTF-8.
> >>
> >>
> >> Trying to solve the problem, I have written a little JSON file and i
> tried
> >> to insert it on my database. My JSON file content was the following:
> >> {"docs":[{"_id":"0","label ":"Pas de taches"}]}
> >>
> >> The result of inserting it on my database was: The result was:
> >> {"ok":true,"id":"doc_id","rev":"1- ffaec7bc2aa548ca8e5a9c697ea3eb64"}
> >>
> >> Next, I changed just a little my JSON file: I've put a special character
> >> (â):
> >> {"docs":[{"_id":"0","label ":"Pas de tâches"}]}
> >>
> >> The result of inserting this JSON file on the database was:
> >> {"error":"bad_request","reason":"invalid_json"}
> >>
> >>
> >>
> >> Anyone can help me with this issue?
> >>
> >> Thank you
> >>
> >> Best regards.
>
> A quick suggestion, download an editor that explicitly supports
> encodings like textpad or komodo, & create your JSON file in that, and
> save as UTF8.
>
> You'll find that works just fine. Sample files in
> https://www.dropbox.com/sh/jeifcxpbtpo78ak/--8BGo8bb3/tmp/utf8wtf.zip
> one created on mac & transferred, the other created in windows.
>
> C:\tmp>curl -HContent-Type:application/json
> http://localhost:5984/testy/utf8mac -XPUT -d@utf8mac.json
> {"ok":true,"id":"utf8mac","rev":"1-b46df9f1f811323a133af7faf36d1a89"}
>
> C:\tmp>curl -HContent-Type:application/json
> http://localhost:5984/testy/utf8windows -XPUT -d@utf8windows.json
> {"ok":true,"id":"utf8windows","rev":"1-b46df9f1f811323a133af7faf36d1a89"}
>
> Without having tested it, something like
>
>    recode latin1..UTF-8 *.json
>
> would probably do the trick, I assume http://unxutils.sourceforge.net/
> version is suitable.
>
> A+
> Dave
>



-- 
Paulo Carvalho
1 rue du Chateau
57710 Aumetz
France
http://forum-informatico.forumeiros.com/index.htm
http://ummundoecologico.blogspot.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message