predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Error when importing data
Date Wed, 02 Aug 2017 16:35:42 GMT
Something is not configured correctly `pio import` should work with any size of file but this
may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across
all cluster machines in memory. That includes derived data so a total of 32g may not be enough.
But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then
HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <carlos.vidal@beeva.com> wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8
VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command
I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches
the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
[htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus
has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus
has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test
writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] -
Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings>
for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net
<http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>
failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE
-> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.


Mime
View raw message