Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 539A8184BD for ; Wed, 24 Feb 2016 06:38:18 +0000 (UTC) Received: (qmail 18586 invoked by uid 500); 24 Feb 2016 06:38:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 18474 invoked by uid 500); 24 Feb 2016 06:38:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18463 invoked by uid 99); 24 Feb 2016 06:38:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2016 06:38:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 59FAAC24DF for ; Wed, 24 Feb 2016 06:38:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.18 X-Spam-Level: *** X-Spam-Status: No, score=3.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_BADIPHTTP=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Wlzfb8iGONHe for ; Wed, 24 Feb 2016 06:38:12 +0000 (UTC) Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1B42A5FB4C for ; Wed, 24 Feb 2016 06:38:12 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id gc3so9181336obb.3 for ; Tue, 23 Feb 2016 22:38:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Cdw8eBhkq/t+zKqRx9NPkRXwIhCB5xjFfiIo8znRY/g=; b=hLlGUDdNdlkgmtkp19CFmMYLJZx5tYLhjSE/SgS76jaRxHObqWeVIGMQWDUZ23MLri kdnCwvwcJlixHm7wgvd4ZLydY6TBm2DDxqOyVX0QP/Qv82S+qFipMXQTkLDYs96H5vWU S6Uq6Trxs+WLEUlA0rfp+gQnfZUtDe4Nq1BO9wINL78plw7+U0PAjYQ4HjGdxrYNFAar H8IBLMtuNe0+nHULpVi0VUWXmdFCkGfACZzGPHraO4JAiK0TNz8s8IaxRsro+jTibwgo lq8gObrTCEndmjpVMCr8Gw2Jsv9XEASJVYTMpIGr82r62lpAS9rsDJ3HlzmHidHb1rxE XGlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Cdw8eBhkq/t+zKqRx9NPkRXwIhCB5xjFfiIo8znRY/g=; b=KS6YAtW8inz7VLWOUnSTECJpR3ImQV4fyeqK1UmQE7w194SZXGuHQYOKLjvvluw4kq vChtJ6xJ0j4wEmuzt1g9ziN4rB5D4ucY6ulSTrQJbatM7kN1Q0ZcmJm3UoWeOgKQ43oV faBwfBQbWGS0JquYwlroLBj3x5xhWMWzFSc6w0d5P8pbku3jB6iWmbuE/mt20nZxaq18 n5ljBL5L7fB62/TcXeajbEYd/P7skjzSa9PVruh9i+JIzA5apw0BSjyeA5gt5sBr/LOQ I9cO9jOfYotPlYyNOh9gAhP8oIC63fGh9Rhva8g7Mat3fppHupeBjb2OvTlpOTS7HFYM 4nfg== X-Gm-Message-State: AG10YORggCDIviiuCtrGnK59nRCGPvgnskVJbj3aZaOwHvbppXgeXFIqSbC9Awke4fG3OAtAchqq84NcQm+cTg== MIME-Version: 1.0 X-Received: by 10.60.233.193 with SMTP id ty1mr32317411oec.0.1456295891461; Tue, 23 Feb 2016 22:38:11 -0800 (PST) Received: by 10.202.185.195 with HTTP; Tue, 23 Feb 2016 22:38:11 -0800 (PST) In-Reply-To: References: Date: Wed, 24 Feb 2016 14:38:11 +0800 Message-ID: Subject: Fwd: [Vote] : Spark-csv 1.3 + Spark 1.5.2 - Error parsing null values except String data type From: Divya Gehlot To: user@pig.apache.org, user@hadoop.apache.org, user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1136af704c7217052c7e4f04 --001a1136af704c7217052c7e4f04 Content-Type: text/plain; charset=UTF-8 Hi, Please vote if you have faced this issue. I am getting error when parsing null values with Spark-csv DataFile : name age alice 35 bob null peter 24 Code : spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 --master yarn-client -i /TestDivya/Spark/Testnull.scala Testnull.scala > import org.apache.spark.sql.types.{StructType, StructField,NullType, > DateType,, IntegerType,, LongType,DoubleType, FloatType, StringType,}; > import java.util.Properties > import org.apache.spark._ > import org.apache.spark.sql._ > > val testnullSchema = StructType(List( > StructField("name", StringType, false), > StructField("age", IntegerType, true))) > val dfreadnull = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("nullValue","").option("treatEmptyValuesAsNulls","true").schema(testnullSchema).load("hdfs://xxx.xxx.xxx.xxx > :8020/TestDivya/Spark/nulltest1.csv > ") Has anybody faced similar issue reading csv file which has null values in fields apart from String datatype . *P.S - Googled it and found the issue is open Spark-csv Github Repo * Thanks, Divya --001a1136af704c7217052c7e4f04 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Please vote if you have faced this issue.
I am getting error whe= n parsing null values with Spark-csv=C2=A0
DataFile :
<= table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" width=3D"128" style= =3D"border-collapse:collapse;width:96pt"> name age alice 35 bob null peter 24 Code :
=C2=A0spark-shell =C2=A0--packa= ges com.databricks:spark-csv_2.10:1.3.0 =C2=A0--master yarn-client -i /Test= Divya/Spark/Testnull.scala

Testnull.scala
import org.apache.spark.sql.types.{StructType,= StructField,NullType, DateType,, IntegerType,, LongType,DoubleType, FloatT= ype, StringType,};
import java.util.Properties
import org.apache.spar= k._
import org.apache.spark.sql._

val testnullSchema =3D StructTy= pe(List(
StructField(&= quot;name", StringType, false),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0StructField("age", Integ= erType, true)))
val dfreadnull =3D sqlContext.read.format("com.data= bricks.spark.csv").option("header", "true").option= ("nullValue","").option("treatEmptyValuesAsNulls&q= uot;,"true").schema(testnullSchema).load("hdfs://xxx.xxx.xxx= .xxx:8020/TestDivya/Spark/nulltest1.csv")
<= div>

=C2=A0Has anybody faced similar issue rea= ding csv file which has null values in fields apart from String datatype .<= /div>

P.S - Googled it and found the issue is open Sp= ark-csv Github Repo=C2=A0

Thanks,
Divya=C2=A0



= =C2=A0

--001a1136af704c7217052c7e4f04--