kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
Date Thu, 07 Jun 2018 00:51:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504103#comment-16504103
] 

ASF subversion and git services commented on KYLIN-3115:
--------------------------------------------------------

Commit 9d2e1a7d2f12c31eaeb10ef2e55a56556d902ce6 in kylin's branch refs/heads/KYLIN-3115 from
shaofengshi
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=9d2e1a7 ]

KYLIN-3115 Incompatible RowKeySplitter initialize between build and merge job


> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>            Reporter: Wang, Gang
>            Assignee: Shaofeng SHI
>            Priority: Minor
>             Fix For: v2.4.0
>
>
> In class NDCuboidBuilder:
>     public NDCuboidBuilder(CubeSegment cubeSegment) {
>         this.cubeSegment = cubeSegment;
>         this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
>         this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
>     } 
> which will create a bytes array with length 256 to fill in rowkey column bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to 256. The cube
building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper
method doMap:
>     public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException
{
>         long cuboidID = rowKeySplitter.split(key.getBytes());
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>             System.arraycopy(bytes, offset, split.value, 0, colLength);
>             offset += colLength;
>         }
> Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column
value length is 256 in bytes and is being copied to a bytes array with length 255.
> The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize
RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our production.
Since in Hive, type varchar(256) is pretty common, users do have not much Kylin knowledge
will prefer to chose fix length encoding on such dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message