kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Gang (JIRA)" <>
Subject [jira] [Created] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
Date Mon, 18 Dec 2017 07:32:00 GMT
Wang, Gang created KYLIN-3115:

             Summary: Incompatible RowKeySplitter initialize between build and merge job
                 Key: KYLIN-3115
             Project: Kylin
          Issue Type: Bug
            Reporter: Wang, Gang

In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
    this.cubeSegment = cubeSegment;
    this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
    this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
which will create a temp bytes array with length 256 to fill in rowkey column bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_

So, if a dimension is encoded in fixed length and the length is 256. The cube building job
will succeed. While, the merge job will always fail.
    public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException
       _ long cuboidID = rowKeySplitter.split(key.getBytes());_
        Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_        // rowkey columns
        for (int i = 0; i < cuboid.getColumns().size(); i++) {
            splitOffsets[i] = offset;
            TblColRef col = cuboid.getColumns().get(i);
            int colLength = colIO.getColumnLength(col);
            SplittedBytes split = this.splitBuffers[this.bufferSize++];
            split.length = colLength;
           _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
            offset += colLength;
Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length
is 256 in bytes and is being copied to a bytes array with length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize
RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our production. Since
in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer
to chose fix length encoding on such dimensions, and set max length as 256. 

This message was sent by Atlassian JIRA

View raw message