From issues-return-44336-archive-asf-public=cust-asf.ponee.io@tez.apache.org Wed Jun 9 15:21:26 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 6034818063F for ; Wed, 9 Jun 2021 17:21:26 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 997B840A9A for ; Wed, 9 Jun 2021 15:21:25 +0000 (UTC) Received: (qmail 23647 invoked by uid 500); 9 Jun 2021 15:21:25 -0000 Mailing-List: contact issues-help@tez.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tez.apache.org Delivered-To: mailing list issues@tez.apache.org Received: (qmail 23637 invoked by uid 99); 9 Jun 2021 15:21:25 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jun 2021 15:21:25 +0000 From: =?utf-8?q?GitBox?= To: issues@tez.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Btez=5D_tprelle_commented_on_pull_request_=23130=3A?= =?utf-8?q?_TEZ-4295=3A_Could_not_decompress_data=2E_Buffer_length_is_too_sm?= =?utf-8?q?all=2E?= Message-ID: <162325208536.15571.7287397673771831039.asfpy@gitbox.apache.org> Date: Wed, 09 Jun 2021 15:21:25 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit In-Reply-To: References: tprelle commented on pull request #130: URL: https://github.com/apache/tez/pull/130#issuecomment-857795873 Hi @abstractdog thanks to look into it I add the issue on the reader of IFile.

    org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:312)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:277)
           at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
           at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
           at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
           at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.InternalError: Could not decompress data. Buffer length is too small.
           at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method)
           at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
           at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
           at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
           at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:92)
           at java.io.DataInputStream.readByte(DataInputStream.java:265)
           at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
           at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readKeyValueLength(IFile.java:935)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:965)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readRawKey(IFile.java:1006)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawKey(IFile.java:987)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.nextRawKey(TezMerger.java:317)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:777)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:206)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:1298)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:666)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:308)
           ... 8 more
            
I was not able to reproduce it on unit test but i run this type of query on a large dataset.

   WITH cte_setting AS (
     SELECT
       a,
       ARRAY(
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN e IS NOT NULL
               AND e <> '' THEN e END
             )
           ).col2
         ),
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN "f" IS NOT NULL
               AND "f" <> '' THEN "f" END
             )
           ).col2
         )
       ) AS arrayOption
     FROM
       table
     GROUP BY
       id
   )
   SELECT
     id,
     t.col.b AS b,
     t.col.b AS b
   FROM
     cte_setting LATERAL VIEW explode(arrayOption) t
   LIMIT
     1000
   
-- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org