hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-10068) LLAP: adjust allocation after decompression
Date Wed, 03 Jun 2015 20:03:38 GMT


Sergey Shelukhin commented on HIVE-10068:

Update from some test runs on TPCDS and TPCH queries, we waste around 15% allocated memory
due to buddy allocator granularity:
$ sed -E "s/.*ALLOCATED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}'
$ sed -E "s/.*ALLOCATED_USED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}'

Some of that is obviously unavoidable, but some could be avoided by implementing this. However,
it's not as bad as I expected (bad results can be seen on very small datasets were stripes/RGs
are routinely smaller than compression block size.

> LLAP: adjust allocation after decompression
> -------------------------------------------
>                 Key: HIVE-10068
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
> We don't know decompressed size of a compression buffer in ORC, all we know is the file-level
compression buffer size. For many files, compression buffers can be smaller than that because
of compact encoding, or because compression block ends for other reasons (different streams,
etc. - "present" streams for example are very small).
> BuddyAllocator should be able to accept back parts of the allocated memory (e.g. allocate
256Kb with minimum allocation of 32Kb, decompress 45Kb, return the last 192Kb as 64+128Kb).
For generality (this depends on implementation), we can make an API like "offer", and allocator
can decide to take back however much it can.

This message was sent by Atlassian JIRA

View raw message