tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [tez] tprelle commented on pull request #130: TEZ-4295: Could not decompress data. Buffer length is too small.
Date Wed, 09 Jun 2021 15:21:25 GMT

tprelle commented on pull request #130:
URL: https://github.com/apache/tez/pull/130#issuecomment-857795873


   Hi  @abstractdog thanks to look into it
   I add the issue on the reader of IFile.
   <pre><code>
    org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error
while doing final merge
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:312)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:277)
           at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
           at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
           at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
           at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.InternalError: Could not decompress data. Buffer length is too small.
           at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
Method)
           at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
           at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
           at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
           at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:92)
           at java.io.DataInputStream.readByte(DataInputStream.java:265)
           at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
           at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readKeyValueLength(IFile.java:935)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:965)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readRawKey(IFile.java:1006)
           at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawKey(IFile.java:987)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.nextRawKey(TezMerger.java:317)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:777)
           at org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:206)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:1298)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:666)
           at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:308)
           ... 8 more
            </code></pre>
   I was not able to reproduce it on unit test but i run this type of query on a large dataset.
   <pre><code>
   WITH cte_setting AS (
     SELECT
       a,
       ARRAY(
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN e IS NOT NULL
               AND e <> '' THEN e END
             )
           ).col2
         ),
         NAMED_STRUCT(
           "b",
           "c",
           "d",
           MAX(
             STRUCT(
               date,
               CASE WHEN "f" IS NOT NULL
               AND "f" <> '' THEN "f" END
             )
           ).col2
         )
       ) AS arrayOption
     FROM
       table
     GROUP BY
       id
   )
   SELECT
     id,
     t.col.b AS b,
     t.col.b AS b
   FROM
     cte_setting LATERAL VIEW explode(arrayOption) t
   LIMIT
     1000
   </code></pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message