hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yuemeng1 <yueme...@huawei.com>
Subject Re: Job aborted due to stage failure
Date Thu, 04 Dec 2014 01:42:33 GMT
hi,thanks a lot for your help,with your help ,my hive-on-spark can work 
well now
it take me long time to install and deploy.here are  some advice,i think 
we need to improve the installation documentation, allowing users to use 
the least amount of time to compile and install
1)add which spark version we should pick from spark github if we select 
built spark instead of download a spark pre-built,tell them the right 
built commad!(not include Pyarn ,Phive)
2)if they get some error during built ,such as 
[ERRO/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]cannot

find symbol
[ERROR]symbol: class JobExecutionStatus,tell them what they can do?
for our users,first to use it ,then  feel good or bad?
and if u need,i can add something to start document


thanks
yuemeng





On 2014/12/3 11:03, Xuefu Zhang wrote:
> When you build Spark, remove -Phive as well as -Pyarn. When you run 
> hive queries, you may need to run "set spark.home=/path/to/spark/dir";
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yuemeng1@huawei.com 
> <mailto:yuemeng1@huawei.com>> wrote:
>
>     hi,XueFu,thanks a lot for your help,now i will provide more detail
>     to reproduce this ssue:
>     1),i checkout a spark branch from hive
>     github(https://github.com/apache/hive/tree/spark on Nov 29,becasue
>     of for version now it will give something wrong about:Caused by:
>     java.lang.RuntimeException: Unable to instantiate
>     org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>     and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>     after built i get package from
>     :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>     2)i checkout spark from
>     https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of
>     spark branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>     v1.2.0-snapshot0 and i compare this spark's pom.xml with
>     spark-parent-1.2.0-SNAPSHOT.pom(get from
>     http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>     there is only difference is spark-parent name,and built command is :
>
>     |mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
>
>     3)comand i execute in hive-shell:
>     ./hive --auxpath
>     /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>     this jar to hive dir lib already)
>     create table student(sno int,sname string,sage int,ssex string)
>     row format delimited FIELDS TERMINATED BY ',';
>     create table score(sno int,cno int,sage int) row format delimited
>     FIELDS TERMINATED BY ',';
>     load data local inpath
>     '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>     into table student;
>     load data local inpath
>     '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>     into table score;
>     set hive.execution.engine=spark;
>     set spark.master=spark://10.175.xxx.xxx:7077;
>     set spark.eventLog.enabled=true;
>     set spark.executor.memory=9086m;
>     set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>     select distinct st.sno,sname from student st join score sc
>     on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work
>     in mr)
>     4)
>     studdent.txt file
>     1,rsh,27,female
>     2,kupo,28,male
>     3,astin,29,female
>     4,beike,30,male
>     5,aili,31,famle
>
>     score.txt file
>     1,10,80
>     2,11,85
>     3,12,90
>     4,13,95
>     5,14,100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>     On 2014/12/2 23:28, Xuefu Zhang wrote:
>>     Could you provide details on how to reproduce the issue? such as
>>     the exact spark branch, the command to build Spark, how you build
>>     Hive, and what queries/commands you run.
>>
>>     We are running Hive on Spark all the time. Our pre-commit test
>>     runs without any issue.
>>
>>     Thanks,
>>     Xuefu
>>
>>     On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yuemeng1@huawei.com
>>     <mailto:yuemeng1@huawei.com>> wrote:
>>
>>         hi,XueFu
>>         i checkout a spark branch from
>>         sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's
>>         pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>         http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>         there is only difference is follow:
>>         in spark-parent-1.2.0-SNAPSHOT.pom
>>         <artifactId>spark-parent</artifactId>
>>         <version>1.2.0-SNAPSHOT</version>
>>         and in v1.2.0-snapshot0
>>         <artifactId>spark-parent</artifactId>
>>           <version>1.2.0</version>
>>         i think there is no essence diff,and i built v1.2.0-snapshot0
>>         and deploy it as my spark clusters
>>         when i run query about join two table ,it still give some
>>         error what i show u earlier
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed
>>         4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>>         7, datasight18): java.lang.NullPointerException+details
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         	at java.lang.Thread.run(Thread.java:722)
>>
>>         Driver stacktrace:
>>
>>
>>
>>         i think my spark clusters did't had any problem,but why
>>         always give me such error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>         On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>         You need to build your spark assembly from spark 1.2 branch.
>>>         this should give your both a spark build as well as
>>>         spark-assembly jar, which you need to copy to Hive lib
>>>         directory. Snapshot is fine, and spark 1.2 hasn't been
>>>         released yet.
>>>
>>>         --Xuefu
>>>
>>>         On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1
>>>         <yuemeng1@huawei.com <mailto:yuemeng1@huawei.com>> wrote:
>>>
>>>
>>>
>>>             hi.XueFu,
>>>             thanks a lot for your inforamtion,but as far as i know
>>>             ,the latest spark version on github is
>>>             spark-snapshot-1.3,but there is no spark-1.2,only have a
>>>             branch-1.2 with spark-snapshot-1.2,can u tell me which
>>>             spark version i should built,and for now,that's
>>>             spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce
>>>             error like that
>>>
>>>
>>>             On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>             It seems that wrong class, HiveInputFormat, is loaded.
>>>>             The stacktrace is way off the current Hive code. You
>>>>             need to build Spark 1.2 and copy spark-assembly jar to
>>>>             Hive's lib directory and that it.
>>>>
>>>>             --Xuefu
>>>>
>>>>             On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>>             <yuemeng1@huawei.com <mailto:yuemeng1@huawei.com>>
wrote:
>>>>
>>>>                 hi,i built a hive on spark package and my spark
>>>>                 assembly jar is
>>>>                 spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
>>>>                 i run a query in hive shell,before execute this query,
>>>>                 i set all the require which hive need with
>>>>                 spark.and i execute a join query :
>>>>                 select distinct st.sno,sname from student st join
>>>>                 score sc on(st.sno=sc.sno) where sc.cno
>>>>                 IN(11,12,13) and st.sage > 28;
>>>>                 but it failed,
>>>>                 get follow error in spark webUI:
>>>>                 Job aborted due to stage failure: Task 0 in stage
>>>>                 1.0 failed 4 times, most recent failure: Lost task
>>>>                 0.3 in stage 1.0 (TID 7, datasight18):
>>>>                 java.lang.NullPointerException+details
>>>>
>>>>                 Job aborted due to stage failure: Task 0 in stage 1.0 failed
4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>                 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>                 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>                 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>                 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>                 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>                 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>                 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>                 	at java.lang.Thread.run(Thread.java:722)
>>>>
>>>>                 Driver stacktrace:
>>>>
>>>>                 can u give me a help to deal this probelm,and i
>>>>                 think my built was succussed!
>>>>
>>>>
>>>
>>>
>>
>>
>
>


Mime
View raw message