kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "朱真龙" <>
Subject 回复:mismatch hdfs addr error when using kylin2.1 on two hadoopcluster
Date Mon, 25 Dec 2017 02:46:30 GMT
thank you for your response.    I have found the way to resolved my problem. I used mysql for
hive metastore  and kylin used between two hadoop cluster ,like this:

its works well when model is not too big. but some model with many row keys does not  .  when
I configured fifteen columns for rowkey and setted dict encode like this :

when running to "Build Dimension Dictionary" step,  I got error as follow( it means  after
writted snapshot to hdfs, when submit these hdfs file to hbase ,the files and hbaseTable are
in different hdfs cluster , one in cluster 1 and one in cluster 2):

through looked into source code the error referred, I found that ,when writting "Dimension
Dictionary" to hbase , kylin will judge if your keyvalue is larger than the configured Max_KeyValue
size (hbase.client.keyvalue.maxsize is setted in hbase-site.xml),if you not setted ,it will
get 10485760(10MB) as default .  if your keyValue size is smaller than hbase.client.keyvalue.maxsize
,then kylin 
will set your key and value into put object then submit. if your keyvalue is bigger than hbase.client.keyvalue.maxsize
,it will write snapshot into hdfs first then submit it via hbase table.  but I think there
is a bug on kylin2.1  when use kylin between two hadoop cluster and build Dimension with keyvalue
bigger than hbase.client.keyvalue.maxsize. 
when keyvlaue size is bigger than hbase.client.keyvalue.maxsize , writte snapshot into hdfs(
it got the hdfs path from hive table instead of hbase, and they are in different hdfs cluster)
. so if you are using one hadoop cluster(hive and hbase are on one hadoop cluster ),it will
works well . if you are using two hadoop cluster and your keyvlue size is smaller than hbase.client.keyvalue.maxsize
,it will also works well (use put object). but if you are using two hadoop cluster and your
keyvalue is bigger than  hbase.client.keyvalue.maxsize , you will get then same error like
mine(beause your hive and hbase are in different hdfs cluster) .
        I checked my hbase.client.keyvalue.maxsize in hbase-site.xml, and found ambari set
1MB as default ,and kylin set 10MB as default .so I changed this value to 10MB , and now my
model works well.  If you find a better way to resolve this problem ,please tell me , thank


------------------ 原始邮件 ------------------
发件人: "jxs";<>;
发送时间: 2017年12月22日(星期五) 下午4:10
收件人: "Kylin Users"<>;

主题: Re:mismatch hdfs addr error when using kylin2.1 on two hadoopcluster

I also deploy kylin on two small EMR clusters for resource isolation, I used MySQL for Hive
metastore and shared the same hive configuration between the two clusters.
You may try this.

在2017年12月22 13时08分, "朱真龙"<>写道:

    thank you for your attention first,I am a chinses kylin user that always read english
documents but write little, and i know most of kylin developer are chinese 。so ,if you
don't know what i mean,i will describe again in chinese。

     I am using kylin2.1.0 between two hadoop cluster (all configured HA) and hadoop have
same version(2.7.1).  by now runing well with sample model  , but  not good in my own model
which has many  columns encoding with dict。like this:

when  building cube , got error like this:

after looked into source code that the error metioned, find that the dictionary got hdfs path
from hive table desc , and when checking before load these hfiles into hbase ,found hfiles
and hbase table are on different hdfs cluster .


so, could you tell me how could i do on this case ?
View raw message