predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seshachalam Malisetti <abb...@gmail.com>
Subject Re: Not able to train data
Date Thu, 26 Oct 2017 07:10:08 GMT
how do unsubscribe from this list ? please help<br><br><signature>Sent from
<a href="https://n1.nylas.com/link/983c247e34fa4dc3dc19fbabbacada4a5de2fc0560b521229ea4d4df44b251ad/0?redirect=https%3A%2F%2Fnylas.com%3Fref%3Dn1&recipient=user%40predictionio.incubator.apache.org">Nylas
Mail</a>, the best free email app for work</signature><img class="n1-open"
width="0" height="0" style="border:0; width:0; height:0;" src="https://n1.nylas.com/open/983c247e34fa4dc3dc19fbabbacada4a5de2fc0560b521229ea4d4df44b251ad?recipient=user%40predictionio.incubator.apache.org">
          <div class="gmail_quote nylas-quote nylas-quote-id-d11d6eed0b4d266faab9d8eacb812156fc67d21ed63f25883f90fb69794488b7">
            <br>
            On Oct 26 2017, at 12:39 pm, Vaghawan Ojha &lt;vaghawan781@gmail.com&gt;
wrote:
            <br>
            <blockquote class="gmail_quote"
              style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
              <div dir="ltr">Hi Abhimanyu, <br /><br /><div>I don't
think this template works with version 0.11.0. As per the template : <br /><br /><span
style="color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Helvetica,Arial,sans-serif,&quot;Apple
Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;Segoe UI Symbol&quot;;font-size:16px">update
for PredictionIO 0.9.2, including:<br /></span><br />I don't think it supports
the latest pio. You rather switch it to 0.9.2 if you want to experiment it. </div></div><div><br
/><div>On Thu, Oct 26, 2017 at 12:52 PM, Abhimanyu Nagrath <span dir="ltr">&lt;<a
href="mailto:abhimanyunagrath@gmail.com" target="_blank">abhimanyunagrath@gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Hi Vaghawan , <div><br /></div><div>I am using v0.11.0-incubating
with (ES - v5.2.1 , Hbase - 1.2.6 , Spark - 2.1.0).</div><div><br /></div><div>Regards,</div><div>Abhimanyu</div></div><div><div><div><br
/><div>On Thu, Oct 26, 2017 at 12:31 PM, Vaghawan Ojha <span dir="ltr">&lt;<a
href="mailto:vaghawan781@gmail.com" target="_blank">vaghawan781@gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Hi Abhimanyu, <br /><br />Ok, which version of pio is this? Because
the template looks old to me. </div><div><div><div><br /><div>On
Thu, Oct 26, 2017 at 12:44 PM, Abhimanyu Nagrath <span dir="ltr">&lt;<a href="mailto:abhimanyunagrath@gmail.com"
target="_blank">abhimanyunagrath@gmail.com</a>&gt;</span> wrote:<br
/><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Hi Vaghawan,<div><br /></div><div>yes, the spark master
connection string is correct I am getting executor fails to connect to spark master after
4-5 hrs.</div><div><br /></div><div><br /></div><div>Regards,</div><div>Abhimanyu</div></div><div><div><div><br
/><div>On Thu, Oct 26, 2017 at 12:17 PM, Sachin Kamkar <span dir="ltr">&lt;<a
href="mailto:sachinkamkar@gmail.com" target="_blank">sachinkamkar@gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">It should be correct, as the user got the exception after 3-4 hours of starting.
So looks like something else broke. OOM?<div><br /><div><div><div
dir="ltr"><div>With Regards,</div><div><br /></div><div> 
   Sachin</div><div>⚜KTBFFH⚜</div></div></div></div><div><div>
<br /><div>On Thu, Oct 26, 2017 at 12:15 PM, Vaghawan Ojha <span dir="ltr">&lt;<a
href="mailto:vaghawan781@gmail.com" target="_blank">vaghawan781@gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr"><span style="font-size:12.8px">&quot;Executor failed to connect with
master &quot;, are you sure the </span><span style="font-size:12.8px">--master
spark://*.*.*.*:7077 is correct? </span><br /><div><span style="font-size:12.8px"><br
/></span></div><div><span style="font-size:12.8px">Like the one
you copied from the spark master's web ui? sometimes having that wrong fails to connect with
the spark master. </span></div><div><span style="font-size:12.8px"><br
/></span></div><div><span style="font-size:12.8px">Thanks</span></div></div><div><div><div><br
/><div>On Thu, Oct 26, 2017 at 12:02 PM, Abhimanyu Nagrath <span dir="ltr">&lt;<a
href="mailto:abhimanyunagrath@gmail.com" target="_blank">abhimanyunagrath@gmail.com</a>&gt;</span>
wrote:<br /><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr"><div>I am new to predictionIO . I am using template <a href="https://github.com/EmergentOrder/template-scala-probabilistic-classifier-batch-lbfgs?recipient=user%40predictionio.incubator.apache.org"
target="_blank">https://github.com/EmergentOrd<wbr></wbr>er/template-scala-probabilisti<wbr></wbr>c-classifier-batch-lbfgs</a>. </div><div><br
/></div><div>My training dataset count is 1184603 having approx 6500 features.
I am using ec2 r4.8xlarge system (240 GB RAM, 32 Cores, 200 GB Swap). </div><div><br
/></div><div><br /></div><div>I tried two ways for training </div><div><br
/></div><div> 1. Command '</div><div><br /></div><div>&gt;
pio train -- --driver-memory 120G --executor-memory 100G -- conf</div><div>&gt;
spark.network.timeout=10000000</div><div><br /></div><div>'</div><div> 
Its throwing exception after 3-4 hours.</div><div>  </div><div><br
/></div><div>    Exception in thread &quot;main&quot; org.apache.spark.SparkExceptio<wbr></wbr>n:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure:
Lost task 0.0 in stage 1.0 (TID 15, localhost, executor driver): ExecutorLostFailure (executor
driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after
181529 ms</div><div>    Driver stacktrace:</div><div>       
    at <a href="http://org.apache.spark.scheduler.dagscheduler.org/?recipient=user%40predictionio.incubator.apache.org"
target="_blank">org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler.org</a>$apache$spark$sch<wbr></wbr>eduler$DAGScheduler$$failJobAn<wbr></wbr>dIndependentStages(DAGSchedule<wbr></wbr>r.scala:1435)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler$$anonfun$abortStage$<wbr></wbr>1.apply(DAGScheduler.scala:142<wbr></wbr>3)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler$$anonfun$abortStage$<wbr></wbr>1.apply(DAGScheduler.scala:142<wbr></wbr>2)</div><div> 
          at scala.collection.mutable.Resiz<wbr></wbr>ableArray$class.foreach(Resiza<wbr></wbr>bleArray.scala:59)</div><div> 
          at scala.collection.mutable.Array<wbr></wbr>Buffer.foreach(ArrayBuffer.sca<wbr></wbr>la:48)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler.abortStage(DAGSchedu<wbr></wbr>ler.scala:1422)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler$$anonfun$handleTaskS<wbr></wbr>etFailed$1.apply(DAGScheduler.<wbr></wbr>scala:802)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler$$anonfun$handleTaskS<wbr></wbr>etFailed$1.apply(DAGScheduler.<wbr></wbr>scala:802)</div><div> 
          at scala.Option.foreach(Option.sc<wbr></wbr>ala:257)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler.handleTaskSetFailed(<wbr></wbr>DAGScheduler.scala:802)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>SchedulerEventProcessLoop.doOn<wbr></wbr>Receive(DAGScheduler.scala:165<wbr></wbr>0)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>SchedulerEventProcessLoop.onRe<wbr></wbr>ceive(DAGScheduler.scala:1605)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>SchedulerEventProcessLoop.onRe<wbr></wbr>ceive(DAGScheduler.scala:1594)</div><div> 
          at org.apache.spark.util.EventLoo<wbr></wbr>p$$anon$1.run(EventLoop.scala:<wbr></wbr>48)</div><div> 
          at org.apache.spark.scheduler.DAG<wbr></wbr>Scheduler.runJob(DAGScheduler.<wbr></wbr>scala:628)</div><div> 
          at org.apache.spark.SparkContext.<wbr></wbr>runJob(SparkContext.scala:1918<wbr></wbr>)</div><div> 
          at org.apache.spark.SparkContext.<wbr></wbr>runJob(SparkContext.scala:1931<wbr></wbr>)</div><div> 
          at org.apache.spark.SparkContext.<wbr></wbr>runJob(SparkContext.scala:1944<wbr></wbr>)</div><div> 
          at org.apache.spark.rdd.RDD$$anon<wbr></wbr>fun$take$1.apply(RDD.scala:135<wbr></wbr>3)</div><div> 
          at org.apache.spark.rdd.RDDOperat<wbr></wbr>ionScope$.withScope(RDDOperati<wbr></wbr>onScope.scala:151)</div><div> 
          at org.apache.spark.rdd.RDDOperat<wbr></wbr>ionScope$.withScope(RDDOperati<wbr></wbr>onScope.scala:112)</div><div> 
          at org.apache.spark.rdd.RDD.withS<wbr></wbr>cope(RDD.scala:362)</div><div> 
          at org.apache.spark.rdd.RDD.take(<wbr></wbr>RDD.scala:1326)</div><div> 
          at org.example.classification.Log<wbr></wbr>isticRegressionWithLBFGSAlgori<wbr></wbr>thm.train(LogisticRegressionWi<wbr></wbr>thLBFGSAlgorithm.scala:28)</div><div> 
          at org.example.classification.Log<wbr></wbr>isticRegressionWithLBFGSAlgori<wbr></wbr>thm.train(LogisticRegressionWi<wbr></wbr>thLBFGSAlgorithm.scala:21)</div><div> 
          at org.apache.predictionio.contro<wbr></wbr>ller.P2LAlgorithm.trainBase(P2<wbr></wbr>LAlgorithm.scala:49)</div><div> 
          at org.apache.predictionio.contro<wbr></wbr>ller.Engine$$anonfun$18.apply(<wbr></wbr>Engine.scala:692)</div><div> 
          at org.apache.predictionio.contro<wbr></wbr>ller.Engine$$anonfun$18.apply(<wbr></wbr>Engine.scala:692)</div><div> 
          at scala.collection.TraversableLi<wbr></wbr>ke$$anonfun$map$1.apply(Traver<wbr></wbr>sableLike.scala:234)</div><div> 
          at scala.collection.TraversableLi<wbr></wbr>ke$$anonfun$map$1.apply(Traver<wbr></wbr>sableLike.scala:234)</div><div> 
          at scala.collection.immutable.Lis<wbr></wbr>t.foreach(List.scala:381)</div><div> 
          at scala.collection.TraversableLi<wbr></wbr>ke$class.map(TraversableLike.s<wbr></wbr>cala:234)</div><div> 
          at scala.collection.immutable.Lis<wbr></wbr>t.map(List.scala:285)</div><div> 
          at org.apache.predictionio.contro<wbr></wbr>ller.Engine$.train(Engine.scal<wbr></wbr>a:692)</div><div> 
          at org.apache.predictionio.contro<wbr></wbr>ller.Engine.train(Engine.scala<wbr></wbr>:177)</div><div> 
          at org.apache.predictionio.workfl<wbr></wbr>ow.CoreWorkflow$.runTrain(Core<wbr></wbr>Workflow.scala:67)</div><div> 
          at org.apache.predictionio.workfl<wbr></wbr>ow.CreateWorkflow$.main(Create<wbr></wbr>Workflow.scala:250)</div><div> 
          at org.apache.predictionio.workfl<wbr></wbr>ow.CreateWorkflow.main(CreateW<wbr></wbr>orkflow.scala)</div><div> 
          at sun.reflect.NativeMethodAccess<wbr></wbr>orImpl.invoke0(Native
Method)</div><div>            at sun.reflect.NativeMethodAccess<wbr></wbr>orImpl.invoke(NativeMethodAcce<wbr></wbr>ssorImpl.java:62)</div><div> 
          at sun.reflect.DelegatingMethodAc<wbr></wbr>cessorImpl.invoke(DelegatingMe<wbr></wbr>thodAccessorImpl.java:43)</div><div> 
          at java.lang.reflect.Method.invok<wbr></wbr>e(Method.java:498)</div><div> 
          at org.apache.spark.deploy.SparkS<wbr></wbr>ubmit$.org$apache$spark$deploy<wbr></wbr>$SparkSubmit$$runMain(SparkSub<wbr></wbr>mit.scala:738)</div><div> 
          at org.apache.spark.deploy.SparkS<wbr></wbr>ubmit$.doRunMain$1(SparkSubmit<wbr></wbr>.scala:187)</div><div> 
          at org.apache.spark.deploy.SparkS<wbr></wbr>ubmit$.submit(SparkSubmit.scal<wbr></wbr>a:212)</div><div> 
          at org.apache.spark.deploy.SparkS<wbr></wbr>ubmit$.main(SparkSubmit.scala:<wbr></wbr>126)</div><div> 
          at org.apache.spark.deploy.SparkS<wbr></wbr>ubmit.main(SparkSubmit.scala)</div><div><br
/></div><div>2. I started spark standalone cluster with 1 master and 3 workers
and executed the command </div><div><br /></div><div>&gt;
pio train -- --master spark://*.*.*.*:7077 --driver-memory 50G</div><div>&gt;
--executor-memory 50G</div><div><br /></div><div>And after some
times getting the error . Executor failed to connect with master and training gets stopped. </div><div><br
/></div><div>I have changed the feature count from 6500 - &gt; 500 and
still the condition is same. So can anyone suggest me am I missing something </div><div><br
/></div><div>and In between training getting continuous warnings like : </div><div>[</div><div><br
/></div><div>&gt; WARN] [ScannerCallable] Ignore, probably already closed</div><div><br
/></div><div><br /></div><div>Regards,</div><div>Abhimanyu</div><div><br
/></div></div>
</blockquote></div><br /></div>
</div></div></blockquote></div><br /></div></div></div></div>
</blockquote></div><br /></div>
</div></div></blockquote></div><br /></div>
</div></div></blockquote></div><br /></div>
</div></div></blockquote></div><br /></div>

            </blockquote>
          </div>

Mime
View raw message