impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
Date Wed, 29 Jun 2016 09:38:04 GMT
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3313

to look at the new patch set (#8).

Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads
......................................................................

IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. If both
the ReadFromCache() and ReadRange() calls fail without reading any
data, we end up creating a whole list of scan-ranges, each with size
1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan
range. This gives a huge performance hit. This patch just calls
ScanRange::Close() after the failed cache reads to clean up the
file system state so that the re-reads start from beginning of
the scan range.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/query_test/test_hdfs_caching.py
3 files changed, 64 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>

Mime
View raw message