hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <o...@yahoo-inc.com>
Subject Re: error about character set(ASCII, UTF-8, Unicode) using TextInputFormat
Date Mon, 09 Oct 2006 16:37:54 GMT

On Oct 9, 2006, at 2:54 AM, 张茂森 wrote:

> Hi all:
>
> I’m trying to use hadoop to process logs. I’ve write some routine  
> to count
> the login times of the same ip. However, because my logs’  
> characters are
> hybrid encoded (ASCII, Unicode, UTF-8 etc), TextInputFormat class  
> in hadoop
> will error. Do you have some good way to solve this problem?

In Hadoop 0.7.0, we disabled the exception when bad UTF8 is given to  
the Text object. In the longer term we will re-enable validation but  
have support for new-line separated binary data, which is what you  
have. *smile*

-- Owen


Mime
View raw message