hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiqiu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4678) libhdfs casts Japanese character incorrectly to Java API
Date Thu, 20 Feb 2014 03:27:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906557#comment-13906557
] 

Jiqiu commented on HDFS-4678:
-----------------------------

JNI specification doc says:

 "There are two differences between this format and the "standard" UTF-8
 format. First, the null byte (byte)0 is encoded using the two-byte
 format rather than the one-byte format. This means that Java VM UTF-8
 strings never have embedded nulls. Second, only the one-byte, two-byte,
 and three-byte formats are used. The Java VM does not recognize the
 longer UTF-8 formats."

that's why some Japanese character cannot be translated. like 𠀋 which is 4 bytes,\xF0\xA0\x80\x8B


>  libhdfs casts Japanese character incorrectly to Java API 
> ----------------------------------------------------------
>
>                 Key: HDFS-4678
>                 URL: https://issues.apache.org/jira/browse/HDFS-4678
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 1.1.2
>         Environment: Platform:    Linux64
> Locale:    Japanese (ja_JP.UTF-8)
>            Reporter: Jiqiu
>            Priority: Minor
>
> put a local file with Japanese characters to hdfs,
> while browsing it in hdfs, it cannot be recognized. 
> here is the test.c
> #include "hdfs.h"
> #include <stdio.h>
> #include <locale.h>
> int main(int argc, char **argv) {
>     if(!setlocale(LC_CTYPE, "ja_JP")) {
>       printf("Can not set locale type\n");
>     }
>     printf("0\n");
>     hdfsFS fs = hdfsConnect("localhost", 9000);
>     printf("1\n");
>     const char* writePath = "/tmp/\xF0\xA0\x80\x8B.txt";
>     printf("2\n");
>     hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
>     if(!writeFile) {
>           fprintf(stderr, "Failed to open %s for writing!\n", writePath);
>           exit(-1);
>     }
>     char* buffer = "Hello, World! \xF0\xA0\x80\x8B";
>     tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
>     if (hdfsFlush(fs, writeFile)) {
>            fprintf(stderr, "Failed to 'flush' %s\n", writePath); 
>           exit(-1);
>     }
>    printf("3\n");
>    hdfsCloseFile(fs, writeFile);
> }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message