Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of yuemeng1@huawei.com
 designates 119.145.14.65 as permitted sender)
Message-ID: <547D7E9D.3010804@huawei.com>
Date: Tue, 2 Dec 2014 16:55:57 +0800
From: yuemeng1 <yuemeng1@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1;
 rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: <user@hive.apache.org>
Subject: Re: Job aborted due to stage failure
References: <547D2263.9090104@huawei.com>
 <CADx-ob3+qyZKqX05dcBa1O05PPFPQSGs7TKvywA7DZPayLqRPw@mail.gmail.com>
 <547D34DC.8070509@huawei.com>
 <CADx-ob2rH4Xuzc7LCfyK-dPiTFin6ZdEstHTjcfy3EK4PETPVQ@mail.gmail.com>
In-Reply-To: 
 <CADx-ob2rH4Xuzc7LCfyK-dPiTFin6ZdEstHTjcfy3EK4PETPVQ@mail.gmail.com>
Content-Type: multipart/alternative;
	boundary="------------050006030609030507060402"

--------------050006030609030507060402
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 7bit

hi,i checkout a spark 1.2 branch from spark github,and built,then copy 
spark assembly jar into Hive lib directory,but when i run this qeury ,it 
still give me this error.
i am very confused,how can i let hive on spark work!

On 2014/12/2 13:39, Xuefu Zhang wrote:
> You need to build your spark assembly from spark 1.2 branch. this 
> should give your both a spark build as well as spark-assembly jar, 
> which you need to copy to Hive lib directory. Snapshot is fine, and 
> spark 1.2 hasn't been released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com 
> <mailto:yuemeng1@huawei.com>> wrote:
>
>
>
>     hi.XueFu,
>     thanks a lot for your inforamtion,but as far as i know ,the latest
>     spark version on github is spark-snapshot-1.3,but there is no
>     spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u
>     tell me which spark version i should built,and for now,that's
>     spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
>     On 2014/12/2 11:03, Xuefu Zhang wrote:
>>     It seems that wrong class, HiveInputFormat, is loaded. The
>>     stacktrace is way off the current Hive code. You need to build
>>     Spark 1.2 and copy spark-assembly jar to Hive's lib directory and
>>     that it.
>>
>>     --Xuefu
>>
>>     On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
>>     <mailto:yuemeng1@huawei.com>> wrote:
>>
>>         hi,i built a hive on spark package and my spark assembly jar
>>         is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a
>>         query in hive shell,before execute this query,
>>         i set all the  require which hive need with spark.and i
>>         execute a join query :
>>         select distinct st.sno,sname from student st join score sc
>>         on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>         but it failed,
>>         get follow error in spark webUI:
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed
>>         4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>>         7, datasight18): java.lang.NullPointerException+details
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         	at java.lang.Thread.run(Thread.java:722)
>>
>>         Driver stacktrace:
>>
>>         can u give me a help to deal this probelm,and i think my
>>         built was succussed!
>>
>>
>
>


--------------050006030609030507060402
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">hi,i checkout a spark 1.2 branch from
      spark github,and built,then copy spark assembly jar into Hive lib
      directory,but when i run this qeury ,it still give me this error.<br>
      i am very confused,how can i let hive on spark work!<br>
      <br>
      On 2014/12/2 13:39, Xuefu Zhang wrote:<br>
    </div>
    <blockquote
cite="mid:CADx-ob2rH4Xuzc7LCfyK-dPiTFin6ZdEstHTjcfy3EK4PETPVQ@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>You need to build your spark assembly from spark 1.2
          branch. this should give your both a spark build as well as
          spark-assembly jar, which you need to copy to Hive lib
          directory. Snapshot is fine, and spark 1.2 hasn't been
          released yet.<br>
          <br>
        </div>
        --Xuefu<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Dec 1, 2014 at 7:41 PM,
          yuemeng1 <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:yuemeng1@huawei.com" target="_blank">yuemeng1@huawei.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div><br>
                <br>
                hi.XueFu,<br>
                thanks a lot for your inforamtion,but as far as i know
                ,the latest spark version on github is
                spark-snapshot-1.3,but there is no spark-1.2,only have a
                branch-1.2 with spark-snapshot-1.2,can u tell me which
                spark version i should built,and for now,that's
                spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce
                error like that
                <div>
                  <div class="h5"><br>
                    <br>
                    On 2014/12/2 11:03, Xuefu Zhang wrote:<br>
                  </div>
                </div>
              </div>
              <div>
                <div class="h5">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>It seems that wrong class, HiveInputFormat,
                        is loaded. The stacktrace is way off the current
                        Hive code. You need to build Spark 1.2 and copy
                        spark-assembly jar to Hive's lib directory and
                        that it.<br>
                        <br>
                      </div>
                      --Xuefu<br>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Mon, Dec 1, 2014 at
                        6:22 PM, yuemeng1 <span dir="ltr">&lt;<a
                            moz-do-not-send="true"
                            href="mailto:yuemeng1@huawei.com"
                            target="_blank">yuemeng1@huawei.com</a>&gt;</span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF"> hi,i
                            built a hive on spark package and my spark
                            assembly jar is
                            spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
                            i run a query in hive shell,before execute
                            this query,<br>
                            i set all the  require which hive need with 
                            spark.and i execute a join query :<br>
                            select distinct st.sno,sname from student st
                            join score sc on(st.sno=sc.sno) where sc.cno
                            IN(11,12,13) and st.sage &gt; 28;<br>
                            but it failed,<br>
                            get follow error in spark webUI:<br>
                            <span>Job aborted due to stage failure: Task
                              0 in stage 1.0 failed 4 times, most recent
                              failure: Lost task 0.3 in stage 1.0 (TID
                              7, datasight18):
                              java.lang.NullPointerException</span><span>+details</span>
                            <div>
                              <pre style="padding:9.5px;font-family:Monaco,Menlo,Consolas,'Courier New',monospace;font-size:0.8em;color:rgb(51,51,51);border-radius:4px;display:block;margin:0px 0px 10px;line-height:20px;word-break:break-all;word-wrap:break-word;white-space:pre-wrap;border:1px solid rgba(0,0,0,0.14902);background-color:rgb(245,245,245)">Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.&lt;init&gt;(HadoopRDD.scala:233)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

Driver stacktrace:

</pre>
                            </div>
                            can u give me a help to deal this
                            probelm,and i think my built was succussed!<br>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------050006030609030507060402--