predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Error when importing data
Date Thu, 03 Aug 2017 15:32:02 GMT
It should be easy to try a smaller batch of data first since we are just guessing


On Aug 2, 2017, at 11:22 PM, Carlos Vidal <carlos.vidal@beeva.com> wrote:

Hello Mahesh, Pat

Thanks for your answers. I will try with a bigger EC2 instance.

Carlos.

2017-08-02 18:42 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>:
Actually memory may be your problem. Mahesh Hegde may be right about trying smaller sets.
Since it sounds like you have all services running on one machine, they may be in contention
for resources.


On Aug 2, 2017, at 9:35 AM, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
wrote:

Something is not configured correctly `pio import` should work with any size of file but this
may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across
all cluster machines in memory. That includes derived data so a total of 32g may not be enough.
But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then
HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <carlos.vidal@beeva.com <mailto:carlos.vidal@beeva.com>>
wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8
VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command
I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches
the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
[htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException:
Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus
has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus
has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] -
Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying
Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test
writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] -
Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings>
for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net
<http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>
failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE
-> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.





Mime
View raw message