impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <>
Subject [jira] [Created] (IMPALA-5417) Consider limiting I/O buffer queue size to 2 buffers
Date Thu, 01 Jun 2017 16:20:04 GMT
Tim Armstrong created IMPALA-5417:

             Summary: Consider limiting I/O buffer queue size to 2 buffers
                 Key: IMPALA-5417
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong

Currently the I/O buffer queue size can dynamically vary from 2 buffers up to 128 buffers.
Retaining this behaviour while adding a memory constraint to the HDFS scans presents some
challenges. The memory constraint is much simpler to implement if the scan node simply hands
the I/O mgr a fixed amount of reservation to work with.

Having 2 buffers is clearly beneficial because it allows overlapping I/O and compute. 

Having > 2 buffers is not clearly beneficial. There are a few cases:

1. I/O is faster than compute and the queue fills up. In that case there is no benefit to
having > 2 buffers because the consumer never sees an empty queue.
2. Compute is faster than I/O and the queue is empty. In that case additional buffering does
not help.
3. Compute and I/O are approximately matched, but there is some variability - I/O may get
a bit ahead of compute, but compute could speed up or I/O slow down temporarily. With 2 buffers,
I/O can be 8MB-16MB ahead of compute and therefore absorb bursts of up to 8MB. That seems
like it should be enough in most cases. In this case it's also not clear to me that the dynamic
sizing algorithm is particularly beneficial - it shrinks the queue aggressively when the scan
gets ahead of the consumer.

This message was sent by Atlassian JIRA

View raw message