hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich" <ol...@yahoo-inc.com>
Subject RE: Why the default LOAD and STORE use UTF-8? Why not use byte?
Date Tue, 26 Aug 2008 15:19:05 GMT
PigStorage is written to work with UTF8 data. You will need to write
your on load/store function to get different semantics.

Olga 

> -----Original Message-----
> From: paradisehit [mailto:paradisehit@163.com] 
> Sent: Tuesday, August 26, 2008 1:52 AM
> To: pig-user@incubator.apache.org; pig-dev@incubator.apache.org
> Subject: Why the default LOAD and STORE use UTF-8? Why not use byte?
> 
> Hello!
>     I have meet a code problem about the charset. I use 
> Hadoop to store the log data, and my log data is not coded in 
> UTF-8, for example GBK in china. If I use the PigStorage() to 
> process my data, the data will be treated as UTF-8, then, I 
> use my program to process the UTF-8 data, it can also run, 
> but the result will be  not right.
>     And can we use the pig LOAD and STORE like Hadoop, not 
> change the orignal data charset, store it as it was! Any one 
> can help me? Or tell me why use the default UTF8?
>  
> 

Mime
View raw message