spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Cepoi <cepoi.eu...@gmail.com>
Subject spark streaming failing to replicate blocks
Date Mon, 19 Oct 2015 12:51:11 GMT
Hi,

I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
The job is reading data from Kinesis and the batch size is of 30s (I used
the same value for the kinesis checkpointing).
In the executor logs I can see every 5 seconds a sequence of stacktraces
indicating that the block replication failed. I am using the default
storage level MEMORY_AND_DISK_SER_2.
WAL is not enabled nor checkpointing (the checkpoint dir is configured for
the spark context but not for the streaming context).

Here is an example of those logs for ip-10-63-160-18. They occur in every
executor while trying to replicate to any other executor.


15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing
connection to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection
error on connection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate
input-1-1445242310000 to BlockManagerId(3,
ip-10-159-151-22.ec2.internal, 50929), failure #0
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO nio.ConnectionManager: Removing
SendingConnection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing
connection to ip-10-63-160-18.ec2.internal/10.63.160.18:39506
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying
ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection
error on connection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
15/10/19 03:11:55 INFO nio.ConnectionManager: Removing
SendingConnection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate
input-1-1445242310000 to BlockManagerId(2,
ip-10-141-12-91.ec2.internal, 39506), failure #1
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 WARN storage.BlockManager: Block
input-1-1445242310000 replicated to only 0 peer(s) instead of 1 peers
15/10/19 03:11:55 INFO receiver.BlockGenerator: Pushed block
input-1-1445242310000



Thanks,
Eugen

Mime
View raw message