drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
Date Mon, 29 Jan 2018 23:44:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344206#comment-16344206
] 

ASF GitHub Bot commented on DRILL-5846:
---------------------------------------

Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1060#discussion_r164600674
  
    --- Diff: exec/memory/base/src/main/java/org/apache/drill/exec/util/MemoryUtils.java ---
    @@ -0,0 +1,186 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.util;
    +
    +import java.lang.reflect.Field;
    +import java.nio.ByteOrder;
    +
    +import sun.misc.Unsafe;
    +
    +/** Exposes advanced Memory Access APIs for Little-Endian / Unaligned platforms */
    +@SuppressWarnings("restriction")
    +public final class MemoryUtils {
    +
    +  // Ensure this is a little-endian hardware */
    +  static {
    +    if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
    +      throw new IllegalStateException("Drill only runs on LittleEndian systems.");
    +    }
    +  }
    +
    +  /** Java's unsafe object */
    +  private static Unsafe UNSAFE;
    +
    +  static {
    +    try {
    +      Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
    +      theUnsafe.setAccessible(true);
    +      UNSAFE = (Unsafe) theUnsafe.get(null);
    +    }
    +    catch (Exception e) {
    +      throw new RuntimeException(e);
    +    }
    +  }
    +
    +  /** Byte arrays offset */
    +  private static final long BYTE_ARRAY_OFFSET = UNSAFE.arrayBaseOffset(byte[].class);
    +
    +  /** Number of bytes in a long */
    +  public static final int LONG_NUM_BYTES  = 8;
    +  /** Number of bytes in an int */
    +  public static final int INT_NUM_BYTES   = 4;
    +  /** Number of bytes in a short */
    +  public static final int SHORT_NUM_BYTES = 2;
    +
    +//----------------------------------------------------------------------------
    +// APIs
    +//----------------------------------------------------------------------------
    +
    +  /**
    +   * @param data source byte array
    +   * @param index index within the byte array
    +   * @return short value starting at data+index
    +   */
    +  public static short getShort(byte[] data, int index) {
    --- End diff --
    
    The sanity check that can be turned on and off is required, otherwise, there is no difference
between Netty `PlatformDependend` and `MemoryUtils` and it will be better to have a unified
way independent from Java assertions to turn bounds checking on or off.
    
    My assumption is that Drill either reads from DrillBuf or writes to DrillBuf as data is
not supposed to be in the heap, so DrillBuf looks like a natural choice for the functionality.
    



> Improve Parquet Reader Performance for Flat Data types 
> -------------------------------------------------------
>
>                 Key: DRILL-5846
>                 URL: https://issues.apache.org/jira/browse/DRILL-5846
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.11.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>              Labels: performance
>             Fix For: 1.13.0
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to further improve
the Parquet Reader performance as several users reported that Parquet parsing represents the
lion share of the overall query execution. It tracks Flat Data types only as Nested DTs might
involve functional and processing enhancements (e.g., a nested column can be seen as a Document;
user might want to perform operations scoped at the document level that is no need to span
all rows). Another JIRA will be created to handle the nested columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message