calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周千昊 <z.qian...@gmail.com>
Subject Re: Kylin 0.7.1 - Failed to build a cube
Date Tue, 07 Jul 2015 11:05:13 GMT
Hi, gaspare
     kylin has an assumption that dimension table is small enough to fit in
memory so that the corresponding directiory should contains only one file.
     So as a workaround, you can merge these files into one single file, so
that kylin will be able to read from it

<gaspare.maria@gfmintegration.it>于2015年7月7日周二 下午6:42写道:

> Hi,
>
> I am trying to create a cube from a star schema created using Hive
> External tables (below an example) stored as TEXT FILE (CSV).
>
> CREATE EXTERNAL TABLE IF NOT EXISTS USERS_TABLE  (
>    uid INT,
>    name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\012'
> STORED AS TEXTFILE
> LOCATION '/data/users';
>
>
> To CSV files are obtained from Spark RDDs, so they are saved as part-xxxx.
> Below the HDFS listing
>
> hdfs dfs -ls /data/users
> Found 12 items
> -rw-r--r--   3 hdfs hdfs          0 2015-07-07 12:05 /data/users/_SUCCESS
> -rw-r--r--   3 hdfs hdfs    3699360 2015-07-07 12:05 /data/users/part-00000
> -rw-r--r--   3 hdfs hdfs    3694740 2015-07-07 12:05 /data/users/part-00001
> -rw-r--r--   3 hdfs hdfs    3685374 2015-07-07 12:05 /data/users/part-00002
> -rw-r--r--   3 hdfs hdfs    3719646 2015-07-07 12:05 /data/users/part-00003
> -rw-r--r--   3 hdfs hdfs    3682476 2015-07-07 12:05 /data/users/part-00004
> -rw-r--r--   3 hdfs hdfs    3679956 2015-07-07 12:05 /data/users/part-00005
> -rw-r--r--   3 hdfs hdfs    3700242 2015-07-07 12:05 /data/users/part-00006
> -rw-r--r--   3 hdfs hdfs    3672186 2015-07-07 12:05 /data/users/part-00007
> -rw-r--r--   3 hdfs hdfs    3682350 2015-07-07 12:05 /data/users/part-00008
> -rw-r--r--   3 hdfs hdfs    3680292 2015-07-07 12:05 /data/users/part-00009
> -rw-r--r--   3 hdfs hdfs    3697722 2015-07-07 12:05 /data/users/part-00010
>
> The CUBE build JOB fails when try to build the Dimension Dictionary with
> the following exception (it seems that the Hive Table data directory MUST
> contain only one file)
>
> java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> hdfs://gas.gfmintegration.it:8020/data/cdr/bb/dimensions/users, but find
> 11
>         at
> org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
>         at
> org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
>         at
> org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
>         at
> org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
>         at
> org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
>         at
> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
>         at
> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>         at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
>         at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>         at
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>         at
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>         at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>         at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
> Do you have any indications on how to create a proper Hive star schema for
> Kylin?
>
> I would like to use external tables (stored as CSV, parquet files or
> HBase) because I need to process the same data also from Spark.
>
> Thanks in advance.
>
> BR,
>
> -- gas
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message