drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4978) Parquet metadata cache on S3 is always renewed
Date Fri, 28 Oct 2016 08:23:58 GMT
Uwe L. Korn created DRILL-4978:

             Summary: Parquet metadata cache on S3 is always renewed
                 Key: DRILL-4978
                 URL: https://issues.apache.org/jira/browse/DRILL-4978
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.8.0
         Environment: Hadoop s3a storage
            Reporter: Uwe L. Korn

As dictionary modification times are not tracked by S3 (see https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories
) the Parquet metadata is always renewed on query planning.

This could either be tuned by:
 * for the case of s3a, check the modification times of all Parquet files in this directory
 * deactivate the metadata cache for s3a

This message was sent by Atlassian JIRA

View raw message