drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4070) Metadata Caching : min/max values are null for varchar columns in auto partitioned data
Date Thu, 12 Nov 2015 17:14:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002438#comment-15002438
] 

Jason Altekruse commented on DRILL-4070:
----------------------------------------

The fix I am planning to make is on the Drill side. I'm pretty sure what is happening is that
the statistics are being written, but the deserialization of the column chunk metadata now
requires that there be a recent version number for it to trust the statistics that are written.
I was going to see what we write in that field and compare it to a file produced by one of
the default object models that use the standard write path.

I don't know how much of the community is bothering to append to parquet files, but I have
confirmed with [~julienledem] in an earlier discussion that writing a new footer should work
fine. This was just a workaround for anyone with a lot of files already written with Drill's
auto-partitioning.

> Metadata Caching : min/max values are null for varchar columns in auto partitioned data
> ---------------------------------------------------------------------------------------
>
>                 Key: DRILL-4070
>                 URL: https://issues.apache.org/jira/browse/DRILL-4070
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.3.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: cache.txt, fewtypes_varcharpartition.tar.tgz
>
>
> git.commit.id.abbrev=e78e286
> The metadata cache file created contains incorrect values for min/max fields for varchar
colums. The data is also partitioned on the varchar column
> {code}
> refresh table metadata fewtypes_varcharpartition;
> {code}
> As a result partition pruning is not happening. This was working after DRILL-3937 has
been fixed (d331330efd27dbb8922024c4a18c11e76a00016b)
> I attached the data set and the cache file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message