hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From edward choi <mp2...@gmail.com>
Subject Re: How to read LZO compressed files?
Date Mon, 02 Jan 2012 07:22:40 GMT

The first solution is my final plan. There are so many lzo files, that
manual decompression would take quite a while

As you suggested, I have used LzoTextInputFormat but I get the following

2012-01-02 16:15:16,668 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-01-02 16:15:16,765 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2012-01-02 16:15:16,858 INFO
com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
2012-01-02 16:15:16,860 INFO com.hadoop.compression.lzo.LzoCodec:
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
2012-01-02 16:15:16,906 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-01-02 16:15:16,908 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Codec for file
not found, cannot run
	at com.hadoop.mapreduce.LzoLineRecordReader.initialize(LzoLineRecordReader.java:97)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:451)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-01-02 16:15:16,910 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

which I don't understand, because I do have LZO codec.
Could you tell me what I am doing wrong here?


2012/1/2 Shi Yu <shiyu@uchicago.edu>

> You could decompress the LZO file manually into plain text then
> using TextInputFormat.
> Alternatively, you don't need to index the LZO compressed file,
> just using LZOInputFormat on non-indexed files, then the LZO
> file will not be split anymore.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message