drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3867) Store relative paths in metadata file
Date Fri, 16 Jun 2017 22:55:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052473#comment-16052473
] 

ASF GitHub Bot commented on DRILL-3867:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/824#discussion_r122515967
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java
---
    @@ -680,7 +731,7 @@ private boolean tableModified(List<String> directories, Path
metaFilePath,
       }
     
       public static abstract class ParquetFileMetadata {
    -    @JsonIgnore public abstract String getPath();
    +    @JsonIgnore public abstract ParquetPath getParquetPath();
    --- End diff --
    
    The structure of metadata cache file isn't changed and deserializing works properly for
new relative paths and for old absolute ones (`new Path(parent, child)` in `deserialize()`
method). 
    
    In the new approach after deserializing list of paths are checked and updated from relative
paths to absolute ones.
    Leaving relative paths in metadata may cause to repeated converting of the paths and checking
in a lot of places the kind of path.
    If old meta cache file is deserialized with absolute paths, nothing is made with them
and an old mechanism works.


> Store relative paths in metadata file
> -------------------------------------
>
>                 Key: DRILL-3867
>                 URL: https://issues.apache.org/jira/browse/DRILL-3867
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Vitalii Diravka
>             Fix For: Future
>
>
> git.commit.id.abbrev=cf4f745
> git.commit.time=29.09.2015 @ 23\:19\:52 UTC
> The below sequence of steps reproduces the issue
> 1. Create the cache file
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem`;
> +-------+-------------------------------------------------------------------------------------+
> |  ok   |                                       summary                             
         |
> +-------+-------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem.
 |
> +-------+-------------------------------------------------------------------------------------+
> 1 row selected (1.558 seconds)
> {code}
> 2. Move the directory
> {code}
> hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
> {code}
> 3. Now run a query on top of it
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
> Error: SYSTEM ERROR: FileNotFoundException: Requested file maprfs:///drill/testdata/metadata_caching/lineitem/2006/1
does not exist.
> [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] (state=,code=0)
> {code}
> This is obvious given the fact that we are storing absolute file paths in the cache file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message