impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Date Fri, 27 Oct 2017 20:14:41 GMT
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.cc@75
PS3, Line 75: g two 
> see comment in row-batch.h about this terminology.
Done


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-sender.cc@178
PS3, Line 178:   // The two OutboundRowBatch which are re-used across multiple RPCs. Each
entry contains
             :   // a RowBatchHeaderPB and the buffers for the serialized tuple offsets and
data. When
             :   // one is being used for an in-flight RPC, the execution thread continues
to run and
             :   // serializes another row batch into the other entry. 'current_batch_idx_'
is the index
             :   // of the entry being used by the in-flight or last completed RPC.
             :   //
             :   // TODO: replace this with an ac
> We need to access these fields from the callback (e.g. due to retry).
req_ removed in PS5.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/row-batch.h@52
PS3, Line 52: class OutboundRowBatch {
> ProtoRowBatch is a conceptual representation of a serialized row batch in b
Removed in PS5.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/row-batch.h@89
PS3, Line 89: 
> Had hard time coming up with a good name to indicate the re-use of the vect
Renamed to OutboundRowBatch in PS5.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/row-batch.h@431
PS3, Line 431:   ///
             :   /// 'uncompressed_size': Updated with the uncompressed size of 'tuple_data'.
             :   ///
             :   /// 'is_compressed': Sets to true if compression is applied on 'tuple_data'.
             :   /// False otherwise.
             :   ///
             :   /// Returns error status if serialization failed. Returns OK otherwise.
             :   /// TODO: clean this up once the thrift RPC implementation is removed.
             :   Status Serialize(bool full_dedup, vector<int32_t>* tuple_offsets, string*
tuple_data,
             :       int64_t* uncompressed_size, bool* is_compressed);
             : 
             :   /// Shared implementation between thrift and protobuf to deserialize a row
batch.
             :   ///
             :   /// 'input_tuple_offsets': an int32_t array of tuples; offsets into 'input_tuple_data'.
             :   /// Used for populating the tuples in the row batch with actual pointers.
             :   ///
             :   /// 'input_tuple_data': contains pointer and size of tuples' data buffer.
             :   /// If 'is_compressed' is true, the data is compressed.
             :   ///
             :   /// 'uncompressed_size': the uncompressed size of 'input_tuple_data' if it's
compressed.
             :   ///
             :   /// 'is_compressed': True if 'input_tuple_data' is compressed.
             :   ///
             :   /// TODO: clean this up once the thrift RPC implementation is removed.
             :   void Deserialize(const kudu::Slice& tuple_offsets, const kudu::Slice&
tuple_data,
             :       int64_t uncompressed_size, bool is_compressed);
             : 
             :   typedef FixedSizeHashTable<Tuple*, int> DedupMap;
             : 
             :   /// The total size of all data represented in this row batch (tuples and
referenced
             :   /// string and collection data). This is the size of the row batch after
removing all
             :   /// gaps in the auxiliary and deduplicated tuples (i.e. the smallest footprint
for the
             :   /// row batch). If the distinct_tuples argument is non-null, full deduplication
is
             :   /// enabled. The distinct_tuples map must be empty.
             :   int64_t TotalByteSize(DedupMap* distinct_tuples);
             : 
             :   void SerializeInternal(int64_t size, DedupMap* distinct_tuples,
             :       vector<int32_t>* tuple_offsets, string* tuple_data);
             : 
             :   /// All members below need to be handled in R
> let's leave a TODO about cleaning this all up once we can remove the thrift
TODO added. Cannot file a JIRA as apache JIRA is being re-indexed.


http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/service/data-stream-service.cc@64
PS3, Line 64: 
            : 
            : 
            : 
            : 
            : 
            : 
            : 
            : 
> see comment in row-batch.h.  I think we should do this later when we actual
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Comment-Date: Fri, 27 Oct 2017 20:14:41 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message