From user-return-70726-apmail-spark-user-archive=spark.apache.org@spark.apache.org Mon Jul 10 18:23:57 2017 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 300841AD75 for ; Mon, 10 Jul 2017 18:23:57 +0000 (UTC) Received: (qmail 54956 invoked by uid 500); 10 Jul 2017 18:23:53 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 54821 invoked by uid 500); 10 Jul 2017 18:23:53 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 54769 invoked by uid 99); 10 Jul 2017 18:23:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jul 2017 18:23:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 10DEA193E89 for ; Mon, 10 Jul 2017 18:23:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id O1U_jbbHXBDs for ; Mon, 10 Jul 2017 18:23:52 +0000 (UTC) Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com [209.85.215.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1B1B962823 for ; Mon, 10 Jul 2017 18:14:51 +0000 (UTC) Received: by mail-lf0-f41.google.com with SMTP id h22so68392029lfk.3 for ; Mon, 10 Jul 2017 11:14:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=9nerhy7LDEituIT7XVB0mHrLfAs6euE+mY3ca54vNek=; b=KGCznLdzQ8RsDgw0E7Z5EBExrKPKmLu6RR2hYsIcYVMt03/83s1gVwDcbnNYMEi88W S1RuK7LVeo+MAXBQjKcU3T8uDampzKfSp+o3VuA79F205rDwIeoqongRpjlXo9V+yGAd NNTmsMzARzrKaQGhXqyB6RK3MJk40ccTT2MDg5pMZ08/WMjRaT5xYjpl0HH+79W5gpi7 Oiji8kkVNn5OfNhFXCMt3q1jH20zT2RqQTfqu5vnpVCxSRtNPSLtWkHW3zuYHMAQqkBi 1b9xhy9oXx/bQijqMpW/merwJrs2/BiTeurOyKhZuVIjpVDwYczgvHHPCfgH7W/J2ymf ICWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=9nerhy7LDEituIT7XVB0mHrLfAs6euE+mY3ca54vNek=; b=ryFCER4Cww+LEIJRZOz8haYYpPMtP0VK2pRKvd3/+DNETFAEgdtNxRKnU3FfdT3Mvm EVqi33crkouhDNw07Ey9jCqyFaGYcjLUJaOpibgO66W/NfzFFWzZFhFdnQKnSGH5Sbgw MdG1w+JTK5NslHe9DS2oU7odtghigvpmvi3If55AnlIEolb7Sg+7S60PSbSrBRLwNqON f5/l/QitUFGc8GQZCJ2KLiPjs7TIWYE625LopFYgaLWNMeo5zm/TDsptN0MpIYyRrmif 1TVkuiIN8NaSSTYK42kyhTuSfipYhW3fxJ6FqaDvkxZW9AfhAxKVioKhtBnfkIOOXfwN VI0A== X-Gm-Message-State: AIVw111BlrDVpeJkY2Jkema529PSAyIZAerARc9Igt+56pG+8MmzVN5S +UefCG9jNv/Hqb+9TJsHKmffTl9N6Q== X-Received: by 10.25.193.73 with SMTP id r70mr1795502lff.111.1499710489445; Mon, 10 Jul 2017 11:14:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.43.201 with HTTP; Mon, 10 Jul 2017 11:14:48 -0700 (PDT) From: Amol Talap Date: Mon, 10 Jul 2017 20:14:48 +0200 Message-ID: Subject: Databricks Spark XML parsing exception while iterating To: user@spark.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi All Does anyone know a fix for below exception. The XML parsing function works fine for unit test as you see in below code but fails while using in RDD. new_xml: org.apache.spark.rdd.RDD[List[(String, String)]] =3D MapPartitionsRDD[119] at map at :57 17/07/10 08:29:54 ERROR Executor: Exception in task 0.0 in stage 31.0 (TID = 50) java.lang.NullPointerException at $line103.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.xmlParse(:52) at $line109.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:57) at $line109.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:57) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD= .scala:875) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD= .scala:875) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scal= a:1897) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scal= a:1897) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja= va:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:617) at java.lang.Thread.run(Thread.java:748) 17/07/10 08:29:54 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID = 51) java.lang.NullPointerException cat /home/spark/XML_Project/XML_Prog.scala println(">>>>>>>START UnitTest for xmlParse") import com.databricks.spark.xml.XmlReader def xmlParse (xml:String) =3D {=09 var xRDD =3D sc.parallelize(Seq(xml)) var df =3D new XmlReader().xmlRdd(spark.sqlContext,xRDD)=09 var out_rdd =3D df.withColumn("comment", explode(df("Comments.Comment"))).select($"comment.Description",$"comment.Ti= tle").rdd out_rdd.collect.map(x=3D>(x(0).toString,x(1).toString)).toList } val xml1=3D"Title1.1D= escript1.1Title1.2Descript1.2Title1.3Descript1.3" val xml_parse =3D xmlParse(xml1) println("<<<<<<<(x.split(',')(0).toInt, x.split(',')(3))) val new_xml =3D xml_pRDDs.map({case (key,value)=3D>(xmlParse(value.toString= ))}) new_xml.foreach(println) cat /home/spark/XML_Project/data.txt 1,Amol,Kolhapur,Title1.1Descript1.1Title1.2Descript1.2Title1.3= Descript1.3 2,Ameet,Bangalore,Title2.1Descript2.1Title2.2Descript2.2 3,Rajesh,Jaipur,Title3.1Descript3.1Title3.2Descript3.2Title3.3= Descript3.3Title3.4</T= itle><Description>Descript3.4</Description></Comment></Comments></books> Regards, Amol --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org