phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pablo Castilla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix
Date Thu, 16 Feb 2017 09:56:41 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869655#comment-15869655
] 

Pablo Castilla commented on PHOENIX-3664:
-----------------------------------------

At the end we have switched from python to scala and all seems to work at very good speed.
We haven't found the phoenixTableAsRDD in python.

We would prefer python as we use it in machine learning implementations, but they are very
very similar so moving to scala is not a big deal.

> Pyspark: pushing filter by date against apache phoenix
> ------------------------------------------------------
>
>                 Key: PHOENIX-3664
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3664
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix client. (Spark
1.6.3 - HBase 1.1.2 under HDP 2.5)
>            Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in phoenix is
created as Date and the filter is a datetime. When I use explain I see spark doesn't push
the filter to phoenix. I have tried a lot of combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>    .format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>    == Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, ENTRY_DATETIME:
date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: int, T_VAL_R2: int, T_VAL_R3: int,
T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax
error. Mismatched input. Expecting "RPAREN", got "12" at line 1, column 219.
>     at org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
>     at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
>     at org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
>     at org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
>     at org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
>     at org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
>     at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
>     ... 102 more
> Caused by: MismatchedTokenException(106!=129)
>     at org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
>     at org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>     at org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
>     at org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
>     at org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
>     at org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
>     at org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
>     at org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
>     at org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
>     at org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789)
>     at org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508)
>     at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108)
>     ... 107 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message