drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5351) Excessive bounds checking in the Parquet reader
Date Mon, 03 Apr 2017 05:12:41 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953007#comment-15953007
] 

ASF GitHub Bot commented on DRILL-5351:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/781


> Excessive bounds checking in the Parquet reader 
> ------------------------------------------------
>
>                 Key: DRILL-5351
>                 URL: https://issues.apache.org/jira/browse/DRILL-5351
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Parth Chandra
>            Assignee: Parth Chandra
>              Labels: ready-to-commit
>
> In profiling the Parquet reader, the variable length decoding appears to be a major bottleneck
making the reader CPU bound rather than disk bound.
> A yourkit profile indicates the following methods being severe bottlenecks -
> VarLenBinaryReader.determineSizeSerial(long)
>   NullableVarBinaryVector$Mutator.setSafe(int, int, int, int, DrillBuf)
>   DrillBuf.chk(int, int)
>   NullableVarBinaryVector$Mutator.fillEmpties()
> The problem is that each of these methods does some form of bounds checking and eventually
of course, the actual write to the ByteBuf is also bounds checked.
> DrillBuf.chk can be disabled by a configuration setting. Disabling this does improve
performance of TPCH queries. In addition, all regression, unit, and TPCH-SF100 tests pass.

> I would recommend we allow users to turn this check off if there are performance critical
queries.
> Removing the bounds checking at every level is going to be a fair amount of work. In
the meantime, it appears that a few simple changes to variable length vectors improves query
performance by about 10% across the board. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message