drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3821) refresh table metadata command is updating the cache every single time
Date Wed, 23 Sep 2015 00:06:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903695#comment-14903695
] 

Rahul Challapalli commented on DRILL-3821:
------------------------------------------

When there is no update to the actual data, what would be the reason to re-construct the cache?


When we have directory based partitions, and a new partition gets added, I would expect the
"refresh table metadata" command to only scan the new partition(directory) and update the
cache file accordingly. With lots of historical data this command can take a few minutes to
run.

> refresh table metadata command is updating the cache every single time
> ----------------------------------------------------------------------
>
>                 Key: DRILL-3821
>                 URL: https://issues.apache.org/jira/browse/DRILL-3821
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>            Reporter: Rahul Challapalli
>            Assignee: Mehant Baid
>             Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> The lineitem folder used below contains 50K  parquet files. I ran the refresh table metadata
command multiple times. After the first run, I expected all subsequent runs to come back very
fast since there is nothing to update. But the below times suggest that drill might be actually
updating the cache file every single time
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata dfs.`/drill/testdata/tpch100_50000files/lineitem`;
> +-------+---------------------------------------------------------------------------------------+
> |  ok   |                                        summary                            
           |
> +-------+---------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/tpch100_50000files/lineitem.
 |
> +-------+---------------------------------------------------------------------------------------+
> 1 row selected (14.108 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata dfs.`/drill/testdata/tpch100_50000files/lineitem`;
> +-------+---------------------------------------------------------------------------------------+
> |  ok   |                                        summary                            
           |
> +-------+---------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/tpch100_50000files/lineitem.
 |
> +-------+---------------------------------------------------------------------------------------+
> 1 row selected (11.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata dfs.`/drill/testdata/tpch100_50000files/lineitem`;
> +-------+---------------------------------------------------------------------------------------+
> |  ok   |                                        summary                            
           |
> +-------+---------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/tpch100_50000files/lineitem.
 |
> +-------+---------------------------------------------------------------------------------------+
> 1 row selected (11.177 seconds)
> {code}
> When I checked the last modified time on the cache file on maprfs, it does indicate that
the cache is touched every time the "refresh table metadata" command is run



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message