spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shobhit gupta <>
Subject Re: SQLcontext changing String field to Long
Date Sun, 11 Oct 2015 01:52:50 GMT
here is what the df.schema.toString() prints.

DF Schema is ::StructType(StructField(batch_id,StringType,true))

I think you nailed the problem, this filed is the part of our hdfs file
path. We have kind of partitioned our data on the basis of batch_ids folder.

How did you get around it?

Thanks for help. :)

On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <>

> can you show the output of df.printSchema? Just a guess but I think I ran
> into something similar with a column that was part of a path in parquet.
> E.g. we had an account_id in the parquet file data itself which was of type
> string but we also named the files in the following manner
> /somepath/account_id=.../file.parquet. Since Spark uses the paths for
> partition discovery, it was actually inferring that account_id is a numeric
> type and upon reading the data, we ran into the exception you're describing
> (this is in Spark 1.4)..
> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <> wrote:
>> Hi there,
>> I have saved my records in to parquet format and am using Spark1.5. But
>> when
>> I try to fetch the columns it throws exception*
>> java.lang.ClassCastException: java.lang.Long cannot be cast to
>> org.apache.spark.unsafe.types.UTF8String*.
>> This filed is saved as String while writing parquet. so here is the sample
>> code and output for the same..
>>"troubling thing is ::" +
>> sqlContext.sql(fileSelectQuery).schema().toString());
>> DataFrame df= sqlContext.sql(fileSelectQuery);
>> JavaRDD<Row> rdd2 = df.toJavaRDD();
>> First Line in the code (Logger) prints this:
>> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>> But the moment after it the execption comes up.
>> Any idea why it is treating the filed as Long? (yeah one unique thing
>> about
>> column is it is always a number e.g. Time-stamp).
>> Any help is appreciated.
>> --
>> View this message in context:
>> Sent from the Apache Spark User List mailing list archive at
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:


*Regards , Shobhit Gupta.*
*"If you salute your job, you have to salute nobody. But if you pollute
your job, you have to salute everybody..!!"*

View raw message