drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4152) Add additional logging and metrics to the Parquet reader
Date Mon, 14 Dec 2015 22:59:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056901#comment-15056901
] 

ASF GitHub Bot commented on DRILL-4152:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47572637
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java
---
    @@ -196,7 +220,15 @@ public boolean next() throws IOException {
         // TODO - figure out if we need multiple dictionary pages, I believe it may be limited
to one
         // I think we are clobbering parts of the dictionary if there can be multiple pages
of dictionary
         do {
    +      long start=inputStream.getPos();
    +      timer.start();
           pageHeader = dataReader.readPageHeader();
    +      long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +      this.updateStats(pageHeader, "Page Header Read", start, timeToRead, 0,0);
    +      logger.trace("ParquetTrace,{},{},{},{},{},{},{},{}","Page Header Read","",
    +          this.parentColumnReader.parentReader.hadoopPath,
    +          this.parentColumnReader.columnDescriptor.toString(), start, 0, 0, timeToRead);
    +      timer.reset();
    --- End diff --
    
    same here


> Add additional logging and metrics to the Parquet reader
> --------------------------------------------------------
>
>                 Key: DRILL-4152
>                 URL: https://issues.apache.org/jira/browse/DRILL-4152
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Parth Chandra
>            Assignee: Deneche A. Hakim
>
> In some cases, we see the Parquet reader as the bottleneck in reading from the file system.
RWSpeedTest is able to read 10x faster than the Parquet reader so reading from disk is not
the issue. This issue is to add more instrumentation to the Parquet reader so speed bottlenecks
can be better diagnosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message