impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Date Tue, 12 Sep 2017 04:42:09 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-5250: Unify decompressor output_length semantics

Patch Set 1:

Commit Message:

Line 9: This patch makes the semantics output_length parameter in
the semantics of the output_length

Line 10: Codec::ProcessBlock to be the same. In existing code different
Codec::ProcessBlock() to be the same across all codecs. In the existing code different decompressors
treat output_length differently.

Line 13:    to actual decompressed length, but it does not set it to actual
to the actual decompressed length

Line 16:    be exactly the same as actual decompressed length, otherwise
as the actual decompressed length

Line 19:    actual decompressed length and will set it to actual decompressed
the actual decompressed length

Line 21: This inconsistency leads to a bug where the error message is
This inconsistency lead to a bug where ....

Line 24: decompressors can handle oversized output buffer correctly.
can handle an oversized output buffer correctly.

Line 25: Lz4Decompressor will use the "safe" version of decompression function
will use the "safe" instead of the "fast" decompression function

Line 26: instead of the "fast" version, for the latter is insecure with corrupted
We use LZ4 for compressing exchange data, so we check the perf impact of these changes. I
can help you with the details.

Line 27: data and requires decompressed length to be known.
and requires the decompressed length to be known
File be/src/util/

Line 153:   // Test the behavior when the decompressor is given too little / too much space
"." at end of sentence

Line 155:   // correct output size when the space is enough, and does not write beyond the
the correct output size

Line 157:   void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec *decompressor,
Out style is "Codec* compressor"

Line 196:     }
I think we should test more directly the scenario mentioned in the JIRA where a corrupt file
can lead to the output_len not being set to 0 properly.
File be/src/util/

Line 426: // If size_only is false, output buffer size must be at least output_len. *output_len
Mention what output_len is set to if a non-OK status is returned.

Line 427: // also updated with actual output size.
the actual output size

Line 575:   // LZ4_decompress_fast will cause a segmentation fault if passed a nullptr output.
Comment still relevant?

Line 579:   if (ret < 0) {
single line

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-HasComments: Yes

View raw message