drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From parthchandra <...@git.apache.org>
Subject [GitHub] drill pull request #781: DRILL-5351: Minimize bounds checking in var len vec...
Date Sat, 18 Mar 2017 22:33:42 GMT
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/781#discussion_r106792968
  
    --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java ---
    @@ -502,11 +502,15 @@ public void setSafe(int index, byte[] bytes) {
           assert index >= 0;
     
           final int currentOffset = offsetVector.getAccessor().get(index);
    -      while (data.capacity() < currentOffset + bytes.length) {
    -        reAlloc();
    -      }
           offsetVector.getMutator().setSafe(index + 1, currentOffset + bytes.length);
    -      data.setBytes(currentOffset, bytes, 0, bytes.length);
    +      try {
    +        data.setBytes(currentOffset, bytes, 0, bytes.length);
    +      } catch (IndexOutOfBoundsException e) {
    --- End diff --
    
    The check for a DrillBuf exceeding bounds is already being done once in Netty (which also
throws the Exception). What we want to do  is to avoid having to do more than one bounds check
for every vector write operation. By adding the suggested check, we simply move the check
from every vector setSafe method to every Drillbuf write. This would possibly impact performance
in other parts of the code. 
    I particularly changed only var len vectors to address a hotspot in the Parquet reader
performance, because as you are suggesting, the right way to fix is to address the low density
batch problem at the same time as the vector overflow problem. Perhaps a longer discussion
may be required here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message