phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lokesh Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-4347) Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue
Date Thu, 02 Nov 2017 10:04:00 GMT
Lokesh Kumar created PHOENIX-4347:
-------------------------------------

             Summary: Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter
issue
                 Key: PHOENIX-4347
                 URL: https://issues.apache.org/jira/browse/PHOENIX-4347
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.11.0
         Environment: CentOS 6.5, Fedora 25
            Reporter: Lokesh Kumar
            Priority: Major


Created a Phoenix table with below schema:

{code:java}
CREATE TABLE IF NOT EXISTS sample_table (
  id VARCHAR NOT NULL, 
  metricid VARCHAR NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  metricvalue DOUBLE, 
  CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
{code}

Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource
 ('org.apache.phoenix.spark') options.

The Spark Dataset's schema is as given below:

root
 |-- ID: string (nullable = true)
 |-- METRICID: string (nullable = true)
 |-- TIMESTAMP: timestamp (nullable = true)
 |-- METRICVALUE: double (nullable = true)

I apply the Dataset's filter operation on Timestamp column as given below:

{code:java}
Dataset<Row> ds = <Derived from Phoenix>
ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
{code}

This operation throws me an exception as:

 testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR
604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column
145.

The generated query looks like this:

{code:java}
2017-11-02 15:29:31,722 INFO  [main] org.apache.phoenix.mapreduce.PhoenixInputFormat
Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM METRIC_TBR_DATA
WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= *2017-10-31 00:00:00.0*)
{code}

The issue is highlighted in bold above, where the timestamp value is not wrapped in to_timestamp()
function.

I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue()
function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the
value with to_timestamp() function.

Please let me know if there is another way of correctly querying Timestamp values in Phoenix
through Spark's Dataset API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message