hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15945) Patch for Key Value, Bytes and Cell
Date Wed, 22 Jun 2016 03:09:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343262#comment-15343262

Enis Soztutar commented on HBASE-15945:

Thanks for the updated patch. Do you mind putting this to review board as well. You can use
the fancy new dev-support/submit-patch.sh utility for convenience if you like. 

- These should be under hbase namespace, no? 
- There seems to be a confusion about how do we represent java byte[]'s in C++. In the patch,
I can see, we are using all three of {{std::string}}, {{std::vector<char>}}, and {{Bytes}}
class itself. 
+using BYTE_ARRAY = std::vector<BYTE_TYPE>;
+using ByteBuffer = std::vector<BYTE_TYPE>;

We should stick with only 1. We have a couple of options to represent these 
- (a) std::string, 
- (b) std::vector<unsigned char>, 
- (c) std::array<unsigned char>, 
- (d) Implement custom class like LevelDB Slice, a char* with length (and maybe offset?).

Object creation in Java is costly since everything is heap allocated. That is partially why
we have Cell.getRowArray() and Cell.getRowOffset(), etc versus CellUtil.cloneRow(cell). If
we want to keep the API super-simple, we can just keep passing std::string's, and have every
string to be cloned from the API. Otherwise, in C++, we should be able to have a {{Buffer}}
class which contains the char*, length and offset, so that we directly return these triplets.

- Cell should not extend KeyValue. Cell should be an abstract class with virtual methods.

+class Cell : public KeyValue{

KeyValue is an implementation that keeps all of the data in an underlying byte[] for historical
reasons. In C++ client side, we are only constructing "Cells", either for {{Puts}}, or for
iterating over the {{Results}}. We actually do not have a reason to keep the data for a Cell
in a single byte[]. So, I think we do not need the KeyValue as in the patch at all. We still
should be able to carry Cells in Puts and also read results from KeyValueCodec into a list-of-cells.
For representing those, we can write a CellImpl class (which is not visible from client API),
and have it implement Cell's methods. The CellImpl can keep pointers to it's underlying rowKey,
column, value, etc which are all byte arrays. This I think should be very similar to the PB
cell message class in cell.proto. We can move all of the KeyValue encoding complexity into
KeyValueEncoder/Decoder classes which will encode / decode given Cells into KeyValue encoded
serialized bytes. 

- Cell should not have SetSequenceId. Only get. 
+void Cell::SetSequenceId(const long &seq_id){

- Why do we need the PB here? 
+#include "if/Cell.pb.h"

> Patch for Key Value, Bytes and Cell
> -----------------------------------
>                 Key: HBASE-15945
>                 URL: https://issues.apache.org/jira/browse/HBASE-15945
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Sudeep Sunthankar
>         Attachments: HBASE-15945-HBASE-14850.v2.patch, HBASE-15945.HBASE-14850.v1.patch
> This patch contains an implementation of Key Value, Bytes and Cell modeled on the lines
of Java implementation.  

This message was sent by Atlassian JIRA

View raw message