impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Date Mon, 04 Dec 2017 19:28:35 GMT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 )

Change subject: IMPALA-5993: Fix the file offset in value parsing error
......................................................................


Patch Set 1:

(7 comments)

Thank you for working on this. Please see my inline comments, especially regarding the additional
variables to track the file offset.

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@390
PS1, Line 390:   ///  - error_in_row is an out bool.  It is set to true if any field had parse
errors
Can you add comments for the new parameters here?


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@405
PS1, Line 405:       uint8_t* error_in_row, int64_t* error_offset,
nit: the line wrapping looks odd, can you wrap at 90 chars?


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@251
PS1, Line 251:     error_offsets[i] = *curr_offset;
isn't fields[i].start the same as curr_offset?


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@264
PS1, Line 264: // define i1 @WriteCompleteTuple(%"class.impala::HdfsScanner"* %this,
             : //                               %"class.impala::MemPool"* %pool,
             : //                               %"struct.impala::FieldLocation"* %fields,
             : //                               %"class.impala::Tuple"* %tuple,
             : //                               %"class.impala::TupleRow"* %tuple_row,
             : //                               %"class.impala::Tuple"* %template,
             : //                               i8* %error_fields, i8* %error_in_row) {
             : // entry:
             : //   %tuple_ptr = bitcast %"class.impala::Tuple"* %tuple
             : //                to <{ %"struct.impala::StringValue", i8 }>*
             : //   %tuple_ptr1 = bitcast %"class.impala::Tuple"* %template
             : //                 to <{ %"struct.impala::StringValue", i8 }>*
             : //   %int8_ptr = bitcast <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr
to i8*
             : //   %null_bytes_ptr = getelementptr i8, i8* %int8_ptr, i32 16
             : //   call void @llvm.memset.p0i8.i64(i8* %null_bytes_ptr, i8 0, i64 1, i32
0, i1 false)
             : //   %0 = bitcast %"class.impala::TupleRow"* %tuple_row
             : //        to <{ %"struct.impala::StringValue", i8 }>**
             : //   %1 = getelementptr <{ %"struct.impala::StringValue", i8 }>*,
             : //                      <{ %"struct.impala::StringValue", i8 }>** %0,
i32 0
             : //   store <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr,
             : //         <{ %"struct.impala::StringValue", i8 }>** %1
             : //   br label %parse
             : //
             : // parse:                                            ; preds = %entry
             : //  %data_ptr = getelementptr %"struct.impala::FieldLocation",
             : //                            %"struct.impala::FieldLocation"* %fields, i32
0, i32 0
             : //  %len_ptr = getelementptr %"struct.impala::FieldLocation",
             : //                           %"struct.impala::FieldLocation"* %fields, i32
0, i32 1
             : //  %slot_error_ptr = getelementptr i8, i8* %error_fields, i32 0
             : //  %data = load i8*, i8** %data_ptr
             : //  %len = load i32, i32* %len_ptr
             : //  %2 = call i1 @WriteSlot(<{ %"struct.impala::StringValue", i8 }>*
%tuple_ptr,
             : //                          i8* %data, i32 %len)
             : //  %slot_parse_error = xor i1 %2, true
             : //  %error_in_row2 = or i1 false, %slot_parse_error
             : //  %3 = zext i1 %slot_parse_error to i8
             : //  store i8 %3, i8* %slot_error_ptr
             : //  %4 = call %"class.impala::ScalarExprEvaluator"* @GetConjunctCtx(
             : //    %"class.impala::HdfsScanner"* %this, i32 0)
             : //  %conjunct_eval = call i16 @"impala::Operators::Eq_StringVal_StringValWrapper"(
             : //    %"class.impala::ScalarExprEvaluator"* %4, %"class.impala::TupleRow"*
%tuple_row)
             : //  %5 = ashr i16 %conjunct_eval, 8
             : //  %6 = trunc i16 %5 to i8
             : //  %val = trunc i8 %6 to i1
             : //  br i1 %val, label %parse3, label %eval_fail
             : //
             : // parse3:                                           ; preds = %parse
             : //   %7 = zext i1 %error_in_row2 to i8
             : //   store i8 %7, i8* %error_in_row
             : //   ret i1 true
             : //
             : // eval_fail:                                        ; preds = %parse
             : //   ret i1 false
             : // }
> Should I maintain this? When I dump IR, this comment has not been updated f
Yes, please update it. We generally try to maintain these as we come across outdated ones.


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668
PS1, Line 668: stream_->file_offset()
> The offset can point to EOF because the data is read ahead of time via Scan
Yes, that was the main reason for opening the Jira. Are you going to fix this?


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@674
PS1, Line 674: Error parsing row: file: $0, (line: $1, pos: $2, offset: $3)
> I think the refined message format is more user friendly. Let me provide an
I agree that it is much easier to understand. One nit: we should print out something like
"file offset" to make it easier to read. Currently it's not clear whether the offset is within
the file or the current row.


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@683
PS1, Line 683: "Error converting column at the index: "
             :        << desc->col_pos() - scan_node_->num_partition_keys()
             :        << " (type: " << desc->type() << ")";
> I think the refined message format is more user friendly. Let me provide an
I don't think this change adds a big improvement, but I'm also OK with keeping it. As a nit,
please remove the "the" from the message.



-- 
To view, visit http://gerrit.cloudera.org:8080/8747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f
Gerrit-Change-Number: 8747
Gerrit-PatchSet: 1
Gerrit-Owner: Kim Jin Chul <jinchul@gmail.com>
Gerrit-Reviewer: Kim Jin Chul <jinchul@gmail.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Dec 2017 19:28:35 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message