Return-Path: X-Original-To: apmail-impala-dev-archive@minotaur.apache.org Delivered-To: apmail-impala-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5AE2A18C92 for ; Fri, 26 Feb 2016 02:12:54 +0000 (UTC) Received: (qmail 98701 invoked by uid 500); 26 Feb 2016 02:12:49 -0000 Delivered-To: apmail-impala-dev-archive@impala.apache.org Received: (qmail 98659 invoked by uid 500); 26 Feb 2016 02:12:49 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Delivered-To: moderator for dev@impala.incubator.apache.org Received: (qmail 97676 invoked by uid 99); 26 Feb 2016 02:10:47 -0000 X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Message-Id: <201602260210.u1Q2Ahqt008691@ip-10-146-233-104.ec2.internal> Date: Fri, 26 Feb 2016 02:10:43 +0000 From: "Dan Hecht (Code Review)" To: Juan Yu , impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Skye Wanderman-Milne Reply-To: dhecht@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?[Impala-CR](cdh5-2.5.0=5F5.7.0)_IMPALA-1886/IMPALA-2154:_Add_support_for_multi-stream_bz2/gzip_compressed_files.=0A?= X-Gerrit-Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8 X-Gerrit-ChangeURL: X-Gerrit-Commit: 5d456621aea58dabc6887b0763deafbce46bfe30 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.10-rc0 Dan Hecht has posted comments on this change. Change subject: IMPALA-1886/IMPALA-2154: Add support for multi-stream bz2/gzip compressed files. ...................................................................... Patch Set 14: (8 comments) http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 484: } L469-484 should be restructured as: if (stream_->eosr()) { if (stream_end) { *eosr = true; } else { return Status(truncated); } } else if (*decompressed_len == 0) { return Status(NO_PRORESS); } http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/util/codec.h File be/src/util/codec.h: Line 130: /// stream_end: if decompressor consumed all input and reached end of compressed stream. I think we should remove the "if decompressor consumed all input" condition, and this should just signify that the end of the 'output' buffer corresponds to the end of a compression stream (and then see my comments in the code). The caller should (and already does check) that there is no more input via the stream_->eosr() check. http://gerrit.cloudera.org:8080/#/c/2219/14/be/src/util/decompress.cc File be/src/util/decompress.cc: Line 89: *stream_end = false; move this to L91 inside the loop. See comment at line 113 for why. Line 113: if (stream_.avail_in == 0) *stream_end = true; As mentioned elsewhere, let's simplify this and just uncondtionally set *stream_end = true here. The avail_in check isn't interesting, since only the caller knows whether there really is more input or not. Line 342: *stream_end = false; same comments as above Line 352: if (stream_.avail_in == 0) *stream_end = true; and here http://gerrit.cloudera.org:8080/#/c/2219/14/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 226: decompressed compressed http://gerrit.cloudera.org:8080/#/c/2219/14/testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test File testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test: Line 194: decompressed this should say "compressed" file. From the users perspective, it's the compressed file that was malformed. -- To view, visit http://gerrit.cloudera.org:8080/2219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icbe617d03a69953f0bf3aa0f7c30d34bc612f9f8 Gerrit-PatchSet: 14 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.5.0_5.7.0 Gerrit-Owner: Juan Yu Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-HasComments: Yes