phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suhas Nalapure (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2336) Queries with small case column-names return empty result-set when working with Spark Datasource Plugin
Date Tue, 29 Dec 2015 10:02:49 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073742#comment-15073742
] 

Suhas Nalapure commented on PHOENIX-2336:
-----------------------------------------

Another pointer: the root cause of this behavior seems to be same as PHOENIX-2547. For columns
with small letter names, the select query generated by PhoenixInputFormat doesn't enclose
the column names in double quotes ("") which means Phoenix will actually look for a column
with the same letter sequence but in upper case and this results in the error. This is also
true of tables created directly through Phoenix and not just Views that map to existing HBase
table.

> Queries with small case column-names return empty result-set when working with Spark
Datasource Plugin 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2336
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2336
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>            Reporter: Suhas Nalapure
>
> Hi,
> The Spark DataFrame filter operation returns empty result-set when column-name is in
the smaller case. Example below:
> DataFrame df = sqlContext.read().format("org.apache.phoenix.spark").options(params).load();
> df.filter("\"col1\" = '5.0'").show(); 
> Result:
> +---+----+---+---+---+---
> | ID|col1| c1| d2| d3| d4|
> +---+----+---+---+---+---+
> +---+----+---+---+---+---+
> Whereas the table actually has some rows matching the filter condition. And if double
quotes are removed from around the column name i.e. df.filter("col1 = '5.0'").show(); , a
ColumnNotFoundException is thrown:
> Exception in thread "main" java.lang.RuntimeException: org.apache.phoenix.schema.ColumnNotFoundException:
ERROR 504 (42703): Undefined column. columnName=D1
>         at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125)
>         at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:80)
>         at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message