drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4070) Metadata Caching : min/max values are null for varchar columns in auto partitioned data
Date Wed, 11 Nov 2015 20:14:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001002#comment-15001002

Aman Sinha commented on DRILL-4070:

If I create another table by reading the parquet data that [~rkins] provided and do a PARTITION
BY (varchar_col), the queries work correctly - both with and without metadata cache.   Did
anything change in the parquet reader or writer for 1.3 ?  I thought the underlying parquet
library was changed/upgraded but  not the reader/writer.   BTW, when I compare two files from
old and newly created files, the parquet-meta and parquet-cat  show they are equivalent, but
the file sizes are different: 
original file supplied by Rahul: 
-rw-r--r--@ 1 asinha  staff  1268 Nov 10 17:18 0_0_1.parquet

New file : 
-rw-r--r--   1 asinha  wheel   1340 Nov 11 11:41 0_0_1.parquet

> Metadata Caching : min/max values are null for varchar columns in auto partitioned data
> ---------------------------------------------------------------------------------------
>                 Key: DRILL-4070
>                 URL: https://issues.apache.org/jira/browse/DRILL-4070
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.3.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
> git.commit.id.abbrev=e78e286
> The metadata cache file created contains incorrect values for min/max fields for varchar
colums. The data is also partitioned on the varchar column
> {code}
> refresh table metadata fewtypes_varcharpartition;
> {code}
> As a result partition pruning is not happening. This was working after DRILL-3937 has
been fixed (d331330efd27dbb8922024c4a18c11e76a00016b)
> I attached the data set and the cache file

This message was sent by Atlassian JIRA

View raw message