kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
Date Mon, 11 Jun 2018 05:29:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507695#comment-16507695
] 

ASF subversion and git services commented on KYLIN-3115:
--------------------------------------------------------

Commit f6b1dfb5ef3239ea252b1498bf4c51235361bbcd in kylin's branch refs/heads/master from shaofengshi
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f6b1dfb ]

KYLIN-3115 Incompatible RowKeySplitter initialize between build and merge job


> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>            Reporter: Wang, Gang
>            Assignee: Shaofeng SHI
>            Priority: Minor
>             Fix For: v2.4.0
>
>
> In class NDCuboidBuilder:
>     public NDCuboidBuilder(CubeSegment cubeSegment) {
>         this.cubeSegment = cubeSegment;
>         this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
>         this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
>     } 
> which will create a bytes array with length 256 to fill in rowkey column bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to 256. The cube
building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper
method doMap:
>     public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException
{
>         long cuboidID = rowKeySplitter.split(key.getBytes());
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>             System.arraycopy(bytes, offset, split.value, 0, colLength);
>             offset += colLength;
>         }
> Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column
value length is 256 in bytes and is being copied to a bytes array with length 255.
> The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize
RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our production.
Since in Hive, type varchar(256) is pretty common, users do have not much Kylin knowledge
will prefer to chose fix length encoding on such dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message