drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4152) Add additional logging and metrics to the Parquet reader
Date Mon, 14 Dec 2015 18:23:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056424#comment-15056424
] 

ASF GitHub Bot commented on DRILL-4152:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47535407
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
---
    @@ -124,9 +131,16 @@
     
       private void loadDictionaryIfExists(final ColumnReader<?> parentStatus,
           final ColumnChunkMetaData columnChunkMetaData, final FSDataInputStream f) throws
IOException {
    +    Stopwatch timer = new Stopwatch();
         if (columnChunkMetaData.getDictionaryPageOffset() > 0) {
           f.seek(columnChunkMetaData.getDictionaryPageOffset());
    +      long start=f.getPos();
    +      timer.start();
           final PageHeader pageHeader = Util.readPageHeader(f);
    +      long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +      timer.reset();
    --- End diff --
    
    no need to call `timer.reset()`


> Add additional logging and metrics to the Parquet reader
> --------------------------------------------------------
>
>                 Key: DRILL-4152
>                 URL: https://issues.apache.org/jira/browse/DRILL-4152
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Parth Chandra
>            Assignee: Deneche A. Hakim
>
> In some cases, we see the Parquet reader as the bottleneck in reading from the file system.
RWSpeedTest is able to read 10x faster than the Parquet reader so reading from disk is not
the issue. This issue is to add more instrumentation to the Parquet reader so speed bottlenecks
can be better diagnosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message