flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Neutatz <neut...@googlemail.com>
Subject Read 727 gz files ()
Date Mon, 06 Jul 2015 13:31:25 GMT
Hi,

I want to do some simple aggregations on 727 gz files (68 GB total) from
HDFS. See code here:

https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/Stats.scala

We are using a Flink-0.9 SNAPSHOT.

I get the following error:

Caused by: java.lang.Exception: The data preparation for task
'Reduce(Reduce at
org.apache.flink.api.scala.GroupedDataSet.reduce(GroupedDataSet.scala:293))'
, caused an e
rror: Error obtaining the sorted input: Thread 'SortMerger spilling thread'
terminated due to an exception: Channel to path
'/data/4/hadoop/tmp/flink-io-0e2460bf-964b-488
3-8eee-12869b9476ab/995a38a2c92536383d0057e3482999a9.000329.channel' could
not be opened.
        at
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:471)
        at
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
        at java.lang.Thread.run(Thread.java:853)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input:
Thread 'SortMerger spilling thread' terminated due to an exception: Channel
to path '/data/4/hado
op/tmp/flink-io-0e2460bf-964b-4883-8eee-12869b9476ab/995a38a2c92536383d0057e3482999a9.000329.channel'
could not be opened.
        at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:607)
        at
org.apache.flink.runtime.operators.RegularPactTask.getInput(RegularPactTask.java:1145)
        at
org.apache.flink.runtime.operators.ReduceDriver.prepare(ReduceDriver.java:93)
        at
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:466)
        ... 3 more
Caused by: java.io.IOException: Thread 'SortMerger spilling thread'
terminated due to an exception: Channel to path
'/data/4/hadoop/tmp/flink-io-0e2460bf-964b-4883-8eee-1
2869b9476ab/995a38a2c92536383d0057e3482999a9.000329.channel' could not be
opened.
        at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:784)
Caused by: java.io.IOException: Channel to path
'/data/4/hadoop/tmp/flink-io-0e2460bf-964b-4883-8eee-12869b9476ab/995a38a2c92536383d0057e3482999a9.000329.channel'
could n
ot be opened.
        at
org.apache.flink.runtime.io.disk.iomanager.AbstractFileIOChannel.<init>(AbstractFileIOChannel.java:61)
        at
org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannel.<init>(AsynchronousFileIOChannel.java:86)
        at
org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockWriterWithCallback.<init>(AsynchronousBlockWriterWithCallback.java:42)
        at
org.apache.flink.runtime.io.disk.iomanager.AsynchronousBlockWriter.<init>(AsynchronousBlockWriter.java:44)
        at
org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.createBlockChannelWriter(IOManagerAsync.java:195)
        at
org.apache.flink.runtime.io.disk.iomanager.IOManager.createBlockChannelWriter(IOManager.java:218)
        at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1318)
        at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:781)
Caused by: java.io.FileNotFoundException:
/data/4/hadoop/tmp/flink-io-0e2460bf-964b-4883-8eee-12869b9476ab/995a38a2c92536383d0057e3482999a9.000329.channel
(Too many open
files in system)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:252)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:133)
        at
org.apache.flink.runtime.io.disk.iomanager.AbstractFileIOChannel.<init>(AbstractFileIOChannel.java:57)
        ... 7 more

Best regards,
Felix

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message