drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4154) Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios
Date Thu, 03 Dec 2015 08:32:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037480#comment-15037480
] 

Parth Chandra commented on DRILL-4154:
--------------------------------------

[~rkins] After many hours of trying to reproduce this, the only way I am able to get the metadata
cache file to look like in 'broken-cache.txt' is if the metadata cache file gets created without
the migration tool having been run on the parquet files. The data files you attached do not
have the appropriate version number and in that case the parquet code prevents us from reading
the stats for binary columns. 
There is an issue with the migration tool in that, at least on a local file system, the timestamp
of the directory does not get updated after the parquet files are updated. This should be
fixed. (Note I have yet to try this on a dfs).

For the second issue, it is likely that when you copied the cache file, the directory timestamp
was also updated. I have seen sometimes, that in such a case the timestamp of the directory
may be a few microseconds newer than the timestamp of the copied cache file. In this case
we think the cache file is stale and recreate it. This behaviour is safe. Also this situation
is unlikely to occur as copying metadata cache files is not likely to happen.

> Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios
> -------------------------------------------------------------------------------------
>
>                 Key: DRILL-4154
>                 URL: https://issues.apache.org/jira/browse/DRILL-4154
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: broken-cache.txt, fewtypes_varcharpartition.tar.tgz, old-cache.txt
>
>
> git.commit.id.abbrev=46c47a2
> I copied the data along with the cache file onto maprfs. Now I ran the upgrade tool (https://github.com/parthchandra/drill-upgrade).
Now I ran the metadata_caching suite from the functional tests (concurrency 10) without the
datagen phase. I see 3 test failures and when I looked at the cache file it seems to be containing
wrong information for the varchar column. 
> Sample from the cache :
> {code}
>       {
>         "name" : [ "varchar_col" ]
>       }, {
>         "name" : [ "float_col" ],
>         "mxValue" : 68797.22,
>         "nulls" : 0
>       }
> {code}
> Now I followed the same steps and instead of running the suites I executed the "REFRESH
TABLE METADATA" command or any query on that folder,  the cache file seems to be created properly
> I attached the data and cache files required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message