kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Gang (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
Date Mon, 18 Dec 2017 07:33:01 GMT

     [ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wang, Gang reassigned KYLIN-3115:
---------------------------------

    Assignee: Wang, Gang

> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
>     this.cubeSegment = cubeSegment;
>     this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
>     this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube building
job will succeed. While, the merge job will always fail.
>     public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException
{
>        _ long cuboidID = rowKeySplitter.split(key.getBytes());_
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _        // rowkey columns
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>            _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
>             offset += colLength;
>         }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column
length is 256 in bytes and is being copied to a bytes array with length 255.
> The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize
RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our production.
Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will
prefer to chose fix length encoding on such dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message