impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan Yu (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3038: Add multistream gzip/bzip2 test coverage
Date Thu, 24 Mar 2016 17:22:50 GMT
Juan Yu has posted comments on this change.

Change subject: IMPALA-3038: Add multistream gzip/bzip2 test coverage

Patch Set 7:

File be/src/util/

Line 255:     // Repeatedly pick random-size input data(~1MB), compress it, then concatenate
> What does the ~1MB mean? I think this is why I got confused about L270 earl
I try to simulate pbzip2, it split large input into smaller chunks then compress them in parallel
and then concatenate result.
I take raw_input(this is 1M), shorten it to make variable length, then compress it. repeat
those to get multiple streams.
int len = RAW_INPUT_SIZE - (rand() % 1024);
compressor->ProcessBlock(false, len, raw_input, &compressed_length, &compressed_stream);

The total output compressed data will be no more than 16M (this is to make sure it's larger
the 8M IO buffer). for the raw input I generated, the compress ratio is about 2:1. so I limit
the total input uncompressed data to no more than 32M.

Line 266:     EXPECT_OK(Codec::CreateCompressor(&mem_pool_, true, format, &compressor));
> Move created compressor above comment

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b0e1971145dd457e71fc9c00ce7c06fff8dea88
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <>
Gerrit-Reviewer: Juan Yu <>
Gerrit-Reviewer: Skye Wanderman-Milne <>
Gerrit-HasComments: Yes

View raw message