drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4152) Add additional logging and metrics to the Parquet reader
Date Mon, 14 Dec 2015 18:19:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056418#comment-15056418
] 

ASF GitHub Bot commented on DRILL-4152:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47534819
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
---
    @@ -122,9 +124,14 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan
rowGroupS
           These fields will be added to the constructor below
           */
           try {
    +        Stopwatch timer = new Stopwatch();
             if ( ! footers.containsKey(e.getPath())){
    -          footers.put(e.getPath(),
    -              ParquetFileReader.readFooter(conf, new Path(e.getPath())));
    +          timer.start();
    +          ParquetMetadata footer = ParquetFileReader.readFooter(conf, new Path(e.getPath()));
    +          long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +          timer.reset();
    --- End diff --
    
    we don't really need to call `timer.reset()` unless we move the timer creation outside
the for loop


> Add additional logging and metrics to the Parquet reader
> --------------------------------------------------------
>
>                 Key: DRILL-4152
>                 URL: https://issues.apache.org/jira/browse/DRILL-4152
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Parth Chandra
>            Assignee: Deneche A. Hakim
>
> In some cases, we see the Parquet reader as the bottleneck in reading from the file system.
RWSpeedTest is able to read 10x faster than the Parquet reader so reading from disk is not
the issue. This issue is to add more instrumentation to the Parquet reader so speed bottlenecks
can be better diagnosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message