hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Re: Hive support for latin1
Date Mon, 02 Aug 2010 06:14:54 GMT
Just change FetchTask.java: public boolean fetch(ArrayList<String> res)

        res.add(((Text) mSerde.serialize(io.o, io.oi)).toString());

Instead of using Text.toString(), use your own method to convert from
raw bytes to unicode String.


Zheng

On Sun, Aug 1, 2010 at 8:31 PM, bc Wong <bcwalrus@cloudera.com> wrote:
> Hi all,
>
> I'm trying to figure out how to query Hive on latin1 encoded data.
>
> I created a file with 256 characters, with unicode value 0-255,
> encoded in latin1. I made a table out of it. But when I do a "select
> *", Hive returns the upper ascii rows as '\xef\xbf\xbd', which is the
> replacement character '\ufffd' encoded in UTF-8.
>
> Does anyone know how to work with non-UTF8 data?
>
> Cheers,
> --
> bc Wong
> Cloudera Software Engineer
>



-- 
Yours,
Zheng
http://www.linkedin.com/in/zshao

Mime
View raw message