impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5705) Parallelise read I/O by prefetching pages when iterating over unpinned BufferedTupleStream
Date Mon, 24 Jul 2017 18:01:00 GMT
Tim Armstrong created IMPALA-5705:
-------------------------------------

             Summary: Parallelise read I/O by prefetching pages when iterating over unpinned
BufferedTupleStream
                 Key: IMPALA-5705
                 URL: https://issues.apache.org/jira/browse/IMPALA-5705
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
    Affects Versions: Impala 2.10.0
            Reporter: Tim Armstrong


We could improve read I/O performance when iterating over unpinned streams in the hash join
and hash aggregation by using additional memory to prefetch pages ahead of the current read
position. Currently iterating over the unpinned stream only uses a single buffer, and only
issues a read I/O when it has finished processing the previous page.

This slows down processing of spilled probe rows in the hash join and spilled unaggregated
rows in the hash aggregation.

We'd need to figure out how to expose this in the BufferedTupleStream interface, but probably
when preparing to read a stream, the client could specify a number of bytes to read ahead
in the stream, which would require additional memory but increase performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message