hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiwon Lee <kiwoni....@gmail.com>
Subject hadoop don't split gzip compressed file, but it seems to be splitted.( ^D^H)
Date Mon, 20 Aug 2012 15:33:03 GMT
Hi,

I have a 20G gzip compressed log file on HDFS.
Because log format of file is complex, I use to create SerDe for parsing.
But, while parse the log file, occurred the parsing exception.
The parser is read as a* ^D^H*, not a line.

127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD"
127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD"
127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD"
127.0.0.1 [2012-08-20] "ABCDEFG" "JSKEJFKDJKFD"
127.0.0.1 [2012-08-20] "ABCDE *^D^H*

The file of small size (about 40M) dose not occur parsing error.
I read that hadoop don't split gzip compressed file, but it seems to be
splitted.

Am i doing anything wrong ?
Plz. help me....

Mime
View raw message