impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan Yu (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3038: Add multistream gzip/bzip2 test coverage
Date Thu, 24 Mar 2016 17:22:50 GMT
Juan Yu has posted comments on this change.

Change subject: IMPALA-3038: Add multistream gzip/bzip2 test coverage
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2543/7/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

Line 255:     // Repeatedly pick random-size input data(~1MB), compress it, then concatenate
> What does the ~1MB mean? I think this is why I got confused about L270 earl
I try to simulate pbzip2, it split large input into smaller chunks then compress them in parallel
and then concatenate result.
I take raw_input(this is 1M), shorten it to make variable length, then compress it. repeat
those to get multiple streams.
int len = RAW_INPUT_SIZE - (rand() % 1024);
compressor->ProcessBlock(false, len, raw_input, &compressed_length, &compressed_stream);

The total output compressed data will be no more than 16M (this is to make sure it's larger
the 8M IO buffer). for the raw input I generated, the compress ratio is about 2:1. so I limit
the total input uncompressed data to no more than 32M.


Line 266:     EXPECT_OK(Codec::CreateCompressor(&mem_pool_, true, format, &compressor));
> Move created compressor above comment
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b0e1971145dd457e71fc9c00ce7c06fff8dea88
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <jyu@cloudera.com>
Gerrit-Reviewer: Juan Yu <jyu@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message