phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhongyuhai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error
Date Wed, 21 Nov 2018 05:10:00 GMT
zhongyuhai created PHOENIX-5035:
-----------------------------------

             Summary: phoenix-spark dataframe filtes date or timestamp type with error
                 Key: PHOENIX-5035
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5035
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.14.1, 5.0.0, 4.13.1, 4.14.0, 4.13.0
         Environment: HBase:apache 1.2

Phoenix:4.13.1-HBase-1.2

Hadoop:CDH 2.6

Spark:2.3.1
            Reporter: zhongyuhai
         Attachments: table desc.png

*table desc as following:*

attach "table desc.png"

 

*code as following:*

val df = SparkUtil.getActiveSession().read.format( "org.apache.phoenix.spark").options(options).load()

df.filter("INCREATEDDATE = date'2018-07-14'")

 

*exception as following:*

java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005):
Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
 at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
 at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)

 

*analyse as following:*

In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any ,

 

 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"

 

// Pass through anything else
case _ => value
}
{code}
 

It only handles the String type , other type returns the toString。It makes the Spark filte
condition "INCREATEDDATE = date'2018-07-14'" translate to Phoenix filte condition like "INCREATEDDATE
= 2018-07-14" ,so Phoenix can not run with this syntax and throw the exception ERROR 203
(22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。

*soluation as following:*

add handle to other type just like Date 、Timestamp 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'"

case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, DateUtil.DEFAULT_DATE_FORMAT)
val df = new SimpleDateFormat(dateFormat)
s"date'${df.format(d)}'"
}

case dt if(isClass(dt , "java.sql.Timestamp")) => {
val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, DateUtil.DEFAULT_TIMESTAMP_FORMAT)
val df = new SimpleDateFormat(dateTimeFormat)
s"timestamp'${df.format(dt)}'"
}

// Pass through anything else
case _ => value
}
}
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message