hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paradisehit <paradise...@163.com>
Subject Why the default LOAD and STORE use UTF-8? Why not use byte?
Date Tue, 26 Aug 2008 08:51:54 GMT
    I have meet a code problem about the charset. I use Hadoop to store the log data, and
my log data is not coded in UTF-8, for example GBK in china. If I use the PigStorage() to
process my data, the data will be treated as UTF-8, then, I use my program to process the
UTF-8 data, it can also run, but the result will be
 not right.
    And can we use the pig LOAD and STORE like Hadoop, not change the orignal data charset,
store it as it was! Any one can help me? Or tell me why use the default UTF8?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message