hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: hadoop fs -text OutOfMemoryError
Date Sat, 14 Dec 2013 10:24:01 GMT
Since snappy is non-splittable file (so that to decompress snappy file, you
need to read it from the beginning to the end), does the *append* operation
handle it well on a plain text file? I guess, that it might be problematic.

Snappy is recommended to use with a container format, like Sequence Files
or Avro, rather directly on plain text, because the plan text file
compressed by Snappy can not be processed in parallel.


2013/12/14 Tao Xiao <xiaotao.cs.nju@gmail.com>

> hi xiao li,
>    you said "Basically, what I need is a Storm HDFS Bolt to be able to
> write output to hdfs file, in order to get less small files, i use hdfs
> append". Did you configue the "append" property in your configuration file?
> you can search for "append" related issues first
>
>
> 2013/12/14 xiao li <xelllee@outlook.com>
>
>> export HADOOP_CLIENT_OPTS="-Xms268435456 -Xmx268435456
>> $HADOOP_CLIENT_OPTS"
>>
>>
>>
>> I guess it is not the memory issue, just the way how i write the snappy
>> compress file to hdfs.
>> Basically, what I need is a Storm HDFS Bolt to be able to write output to
>> hdfs file, in order to get less small files, i use hdfs append.
>>
>> Well I just can't get snappy working or write compressed files to hdfs
>> through Java.
>>
>> I am looking at the flume hdfs sink to get better code. ; )
>>
>>
>> https://github.com/cloudera/flume-ng/blob/cdh4-1.1.0_4.0.0/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java
>>
>> ------------------------------
>> Date: Fri, 13 Dec 2013 22:24:21 +0100
>>
>> Subject: Re: hadoop fs -text OutOfMemoryError
>> From: kawa.adam@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> Hi,
>>
>> What is the value of HADOOP_CLIENT_OPTS in you hadoop-env.sh file?
>>
>> We had similar problems with running OOM with hadoop fs command (I do not
>> remember if they were exactly related to -text + snappy), when we decreased
>> the heap to some small value. With higher value e.g. 1 or 2 GB, we were
>> fine:
>>
>> # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
>> export HADOOP_CLIENT_OPTS="-Xmx2048m ${HADOOP_CLIENT_OPTS}"
>>
>>
>> 2013/12/13 xiao li <xelllee@outlook.com>
>>
>> Hi Tao
>>
>> Thanks for your reply,
>>
>> This is the code, it is pretty simple.
>>
>> '
>>                     fsDataOutputStream.write(Snappy.compress(NEWLINE));
>>                     fsDataOutputStream
>> .write(Snappy.compress(json.getBytes("UTF-8")));'
>>
>>
>> but FSDataOutputStream is actually opened for appending, I guess the I
>> can't simply append to the snappy file(know nothing about it.)
>>
>>
>>
>> ------------------------------
>> Date: Fri, 13 Dec 2013 21:42:38 +0800
>> Subject: Re: hadoop fs -text OutOfMemoryError
>> From: xiaotao.cs.nju@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> can you describe your problems in more details, for example, was snappy
>> library installed correctly in your cluster, how did you code yout files
>> with snappy, was your file correctly coded with snappy ?
>>
>>
>> 2013/12/13 xiao li <xelllee@outlook.com>
>>
>> I could view the snappy file with hadoop fs -cat but when i issue the
>> -text, it gives me this error though the file size is really tiny. what
>> have i done wrong? Thanks
>>
>> hadoop fs -text /test/SinkToHDFS-ip-.us-west-2.compute.internal-6703-22-
>> 20131212-0.snappy
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>  at org.apache.hadoop.io.compress.BlockDecompressorStream.
>> getCompressedData(BlockDecompressorStream.java:115)
>>  at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(
>> BlockDecompressorStream.java:95)
>>  at org.apache.hadoop.io.compress.DecompressorStream.read(
>> DecompressorStream.java:83)
>>  at java.io.InputStream.read(InputStream.java:82)
>> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
>>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
>>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
>> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:86)
>>  at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:81)
>>  at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
>> at org.apache.hadoop.fs.shell.Command.processPathArgument(
>> Command.java:278)
>>  at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>>  at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>> at org.apache.hadoop.fs.shell.Command.processRawArguments(
>> Command.java:190)
>>  at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
>>  at org.apache.hadoop.fs.FsShell.run(FsShell.java:254)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>  at org.apache.hadoop.fs.FsShell.main(FsShell.java:304)
>>
>>
>>
>>
>

Mime
View raw message