spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Baretta <alexbare...@gmail.com>
Subject Re: Unsupported Catalyst types in Parquet
Date Wed, 31 Dec 2014 01:21:58 GMT
Sorry! My bad. I had stale spark jars sitting on the slave nodes...

Alex

On Tue, Dec 30, 2014 at 4:39 PM, Alessandro Baretta <alexbaretta@gmail.com>
wrote:

> Gents,
>
> I tried #3820. It doesn't work. I'm still getting the following exceptions:
>
> Exception in thread "Thread-45" java.lang.RuntimeException: Unsupported
> datatype DateType
>         at scala.sys.package$.error(package.scala:27)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$fromDataType$2.apply(ParquetTypes.scala:343)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$fromDataType$2.apply(ParquetTypes.scala:292)
>         at scala.Option.getOrElse(Option.scala:120)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:291)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$4.apply(ParquetTypes.scala:363)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$anonfun$4.apply(ParquetTypes.scala:362)
>
> I would more than happy to fix this myself, but I would need some help
> wading through the code. Could anyone explain to me what exactly is needed
> to support a new data type in SparkSQL's Parquet storage engine?
>
> Thanks.
>
> Alex
>
> On Mon, Dec 29, 2014 at 10:20 PM, Wang, Daoyuan <daoyuan.wang@intel.com>
> wrote:
>
>>  By adding a flag in SQLContext, I have modified #3822 to include
>> nanoseconds now. Since passing too many flags is ugly, now I need the whole
>> SQLContext, so that we can put more flags there.
>>
>>
>>
>> Thanks,
>>
>> Daoyuan
>>
>>
>>
>> *From:* Michael Armbrust [mailto:michael@databricks.com]
>> *Sent:* Tuesday, December 30, 2014 10:43 AM
>> *To:* Alessandro Baretta
>> *Cc:* Wang, Daoyuan; dev@spark.apache.org
>> *Subject:* Re: Unsupported Catalyst types in Parquet
>>
>>
>>
>> Yeah, I saw those.  The problem is that #3822 truncates timestamps that
>> include nanoseconds.
>>
>>
>>
>> On Mon, Dec 29, 2014 at 5:14 PM, Alessandro Baretta <
>> alexbaretta@gmail.com> wrote:
>>
>> Michael,
>>
>>
>>
>> Actually, Adrian Wang already created pull requests for these issues.
>>
>>
>>
>> https://github.com/apache/spark/pull/3820
>>
>> https://github.com/apache/spark/pull/3822
>>
>>
>>
>> What do you think?
>>
>>
>>
>> Alex
>>
>>
>>
>> On Mon, Dec 29, 2014 at 3:07 PM, Michael Armbrust <michael@databricks.com>
>> wrote:
>>
>> I'd love to get both of these in.  There is some trickiness that I talk
>> about on the JIRA for timestamps since the SQL timestamp class can support
>> nano seconds and I don't think parquet has a type for this.  Other systems
>> (impala) seem to use INT96.  It would be great to maybe ask on the parquet
>> mailing list what the plan is there to make sure that whatever we do is
>> going to be compatible long term.
>>
>>
>>
>> Michael
>>
>>
>>
>> On Mon, Dec 29, 2014 at 8:13 AM, Alessandro Baretta <
>> alexbaretta@gmail.com> wrote:
>>
>> Daoyuan,
>>
>> Thanks for creating the jiras. I need these features by... last week, so
>> I'd be happy to take care of this myself, if only you or someone more
>> experienced than me in the SparkSQL codebase could provide some guidance.
>>
>> Alex
>>
>> On Dec 29, 2014 12:06 AM, "Wang, Daoyuan" <daoyuan.wang@intel.com> wrote:
>>
>> Hi Alex,
>>
>> I'll create JIRA SPARK-4985 for date type support in parquet, and
>> SPARK-4987 for timestamp type support. For decimal type, I think we only
>> support decimals that fits in a long.
>>
>> Thanks,
>> Daoyuan
>>
>> -----Original Message-----
>> From: Alessandro Baretta [mailto:alexbaretta@gmail.com]
>> Sent: Saturday, December 27, 2014 2:47 PM
>> To: dev@spark.apache.org; Michael Armbrust
>> Subject: Unsupported Catalyst types in Parquet
>>
>> Michael,
>>
>> I'm having trouble storing my SchemaRDDs in Parquet format with SparkSQL,
>> due to my RDDs having having DateType and DecimalType fields. What would it
>> take to add Parquet support for these Catalyst? Are there any other
>> Catalyst types for which there is no Catalyst support?
>>
>> Alex
>>
>>
>>
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message