spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shobhit gupta <smartsho...@gmail.com>
Subject Re: SQLcontext changing String field to Long
Date Sun, 11 Oct 2015 01:52:50 GMT
here is what the df.schema.toString() prints.

DF Schema is ::StructType(StructField(batch_id,StringType,true))

I think you nailed the problem, this filed is the part of our hdfs file
path. We have kind of partitioned our data on the basis of batch_ids folder.

How did you get around it?

Thanks for help. :)

On Sat, Oct 10, 2015 at 7:55 AM, Yana Kadiyska <yana.kadiyska@gmail.com>
wrote:

> can you show the output of df.printSchema? Just a guess but I think I ran
> into something similar with a column that was part of a path in parquet.
> E.g. we had an account_id in the parquet file data itself which was of type
> string but we also named the files in the following manner
> /somepath/account_id=.../file.parquet. Since Spark uses the paths for
> partition discovery, it was actually inferring that account_id is a numeric
> type and upon reading the data, we ran into the exception you're describing
> (this is in Spark 1.4)..
>
> On Fri, Oct 9, 2015 at 7:55 PM, Abhisheks <smartshobhu@gmail.com> wrote:
>
>> Hi there,
>>
>> I have saved my records in to parquet format and am using Spark1.5. But
>> when
>> I try to fetch the columns it throws exception*
>> java.lang.ClassCastException: java.lang.Long cannot be cast to
>> org.apache.spark.unsafe.types.UTF8String*.
>>
>> This filed is saved as String while writing parquet. so here is the sample
>> code and output for the same..
>>
>> logger.info("troubling thing is ::" +
>> sqlContext.sql(fileSelectQuery).schema().toString());
>> DataFrame df= sqlContext.sql(fileSelectQuery);
>> JavaRDD<Row> rdd2 = df.toJavaRDD();
>>
>> First Line in the code (Logger) prints this:
>> troubling thing is ::StructType(StructField(batch_id,StringType,true))
>>
>> But the moment after it the execption comes up.
>>
>> Any idea why it is treating the filed as Long? (yeah one unique thing
>> about
>> column is it is always a number e.g. Time-stamp).
>>
>> Any help is appreciated.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SQLcontext-changing-String-field-to-Long-tp25005.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>


-- 




*Regards , Shobhit Gupta.*
*"If you salute your job, you have to salute nobody. But if you pollute
your job, you have to salute everybody..!!"*

Mime
View raw message