carbondata-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Divya Gupta <di...@knoldus.com>
Subject Re: Issue with quickstart introduction
Date Thu, 27 Jul 2017 04:53:55 GMT
Thanks for your interest in CarbonData.

/test/carbondata/default/test_carbon/  folder is empty because the data
load failed.

Inserting single or multiple rows in the CarbonData table, using the Values
clause with Insert statement, is currently not supported in CarbonData.
Please try loading data using a CSV file and the Load statement. For e.g.
carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE carbon_test")

The csv file to be used can be either on the local disk or on HDFS.

Regards
Divya Gupta


On Wed, Jul 26, 2017 at 9:29 PM, Arnaud G <greatpatton@gmail.com> wrote:

> Hi,
>
>
>
> I have compiled the latest version of CarbonData which is compatible with
> HDP2.6. I’m doing the following steps but the data are never copied to the
> table.
>
>
>
> Start Spark Shell:
>
> /home/ubuntu/carbondata# spark-shell --jars /home/ubuntu/carbondata/carbon
> data_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar
>
>
>
> Welcome to
>
>       ____              __
>
>      / __/__  ___ _____/ /__
>
>     _\ \/ _ \/ _ `/ __/  '_/
>
>    /___/ .__/\_,_/_/ /_/\_\   version 2.1.0.2.6.0.3-8
>
>       /_/
>
>
>
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_121)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
>
>
> scala>  import org.apache.spark.sql.SparkSession
>
> import org.apache.spark.sql.SparkSession
>
>
>
> scala> import org.apache.spark.sql.CarbonSession._
>
> import org.apache.spark.sql.CarbonSession._
>
>
>
> scala> val carbon = SparkSession.builder().config(
> sc.getConf).getOrCreateCarbonSession("/test/carbondata/","/
> test/carbondata/")
>
> 17/07/26 14:58:42 WARN SparkContext: Using an existing SparkContext; some
> configuration may not take effect.
>
> 17/07/26 14:58:42 WARN CarbonProperties: main The enable unsafe sort value
> "null" is invalid. Using the default value "false
>
> 17/07/26 14:58:42 WARN CarbonProperties: main The custom block
> distribution value "null" is invalid. Using the default value "false
>
> 17/07/26 14:58:42 WARN CarbonProperties: main The enable vector reader
> value "null" is invalid. Using the default value "true
>
> 17/07/26 14:58:42 WARN CarbonProperties: main The value "null" configured
> for key carbon.lock.type" is invalid. Using the default value "HDFSLOCK
>
> carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSes
> sion@5f7bd970
>
>
>
> scala> carbon.sql("CREATE TABLE IF NOT EXISTS test_carbon(id string, name
> string, city string,age Int)  STORED BY 'carbondata'")
>
> 17/07/26 15:04:35 AUDIT CreateTable: [gateway-dc1r04n01][hdfs][Thread-1]Creating
> Table with Database name [default] and Table name [test_carbon]
>
> 17/07/26 15:04:36 WARN HiveExternalCatalog: Couldn't find corresponding
> Hive SerDe for data source provider org.apache.spark.sql.CarbonSource.
> Persisting data source table `default`.`test_carbon` into Hive metastore in
> Spark SQL specific format, which is NOT compatible with Hive.
>
> 17/07/26 15:04:36 AUDIT CreateTable: [gateway-dc1][hdfs][Thread-1]Table
> created with Database name [default] and Table name [test_carbon]
>
> res7: org.apache.spark.sql.DataFrame = []
>
>
>
> scala> carbon.sql("describe test_carbon").show()
>
> +--------+---------+-------+
>
> |col_name|data_type|comment|
>
> +--------+---------+-------+
>
> |      id|   string|   null|
>
> |    name|   string|   null|
>
> |    city|   string|   null|
>
> |     age|      int|   null|
>
> +--------+---------+-------+
>
>
>
>
>
> scala> carbon.sql("INSERT INTO test_carbon VALUES(1,'x1','x2',34)")
>
> 17/07/26 15:07:25 AUDIT CarbonDataRDDFactory$:
> [gateway-dc1][hdfs][Thread-1]Data load request has been received for
> table default.test_carbon
>
> 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: main sort scope is set to
> LOCAL_SORT
>
> 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch
> worker for task 5 sort scope is set to LOCAL_SORT
>
> 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch
> worker for task 5 batch sort size is set to 0
>
> 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch
> worker for task 5 sort scope is set to LOCAL_SORT
>
> 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch
> worker for task 5 sort scope is set to LOCAL_SORT
>
> 17/07/26 15:07:25 AUDIT CarbonDataRDDFactory$:
> [gateway-dc1r04n01][hdfs][Thread-1]Data load is successful for
> default.test_carbon
>
> res11: org.apache.spark.sql.DataFrame = []
>
>
>
> scala> carbon.sql("LOAD DATA INPATH 'hdfs://xxxx/test/carbondata/sample.csv'
> INTO TABLE test_carbon")
>
> 17/07/26 14:59:28 AUDIT CarbonDataRDDFactory$:
> [gateway-dc1][hdfs][Thread-1]Data load request has been received for
> table default.test_table
>
> 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: main sort scope is set to
> LOCAL_SORT
>
> 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch
> worker for task 0][partitionID:default_test_ta
> ble_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT
>
> 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch
> worker for task 0][partitionID:default_test_ta
> ble_8662d5ff-9392-4e23-b37e-9a4485f71f0e] batch sort size is set to 0
>
> 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch
> worker for task 0][partitionID:default_test_ta
> ble_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT
>
> 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch
> worker for task 0][partitionID:default_test_ta
> ble_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT
>
> 17/07/26 14:59:29 AUDIT CarbonDataRDDFactory$:
> [gateway-dc1][hdfs][Thread-1]Data load is successful for
> default.test_table
>
> res1: org.apache.spark.sql.DataFrame = []
>
>
>
>
>
> scala> carbon.sql("Select * from test_carbon").show()
>
> java.io.FileNotFoundException: File /test/carbondata/default/test_table/Fact/Part0/Segment_0
> does not exist.
>
>   at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingItera
> tor.<init>(DistributedFileSystem.java:1081)
>
>   at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingItera
> tor.<init>(DistributedFileSystem.java:1059)
>
>   at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(
> DistributedFileSystem.java:1004)
>
>   at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(
> DistributedFileSystem.java:1000)
>
>   at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst
> emLinkResolver.java:81)
>
>   at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStat
> us(DistributedFileSystem.java:1000)
>
>   at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem
> .java:1735)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getFileStatus
> Internal(CarbonInputFormat.java:862)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getFileStatus
> (CarbonInputFormat.java:845)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.listStatus(Ca
> rbonInputFormat.java:802)
>
>   at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSpl
> its(FileInputFormat.java:387)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getSplitsInte
> rnal(CarbonInputFormat.java:319)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getTableBlock
> Info(CarbonInputFormat.java:523)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getSegmentAbs
> tractIndexs(CarbonInputFormat.java:616)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocks
> OfSegment(CarbonInputFormat.java:441)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(Car
> bonInputFormat.java:379)
>
>   at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(Car
> bonInputFormat.java:302)
>
>   at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(
> CarbonScanRDD.scala:81)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>
>   at scala.Option.getOrElse(Option.scala:121)
>
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>
>   at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti
> tionsRDD.scala:35)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>
>   at scala.Option.getOrElse(Option.scala:121)
>
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>
>   at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapParti
> tionsRDD.scala:35)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>
>   at scala.Option.getOrElse(Option.scala:121)
>
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(
> SparkPlan.scala:311)
>
>   at org.apache.spark.sql.execution.CollectLimitExec.executeColle
> ct(limit.scala:38)
>
>   at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
> ataset$$execute$1$1.apply(Dataset.scala:2378)
>
>   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> nId(SQLExecution.scala:57)
>
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
>
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$e
> xecute$1(Dataset.scala:2377)
>
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$c
> ollect(Dataset.scala:2384)
>
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.s
> cala:2120)
>
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.s
> cala:2119)
>
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810)
>
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2119)
>
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2334)
>
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>
>  at org.apache.spark.sql.Dataset.show(Dataset.scala:638)
>
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:597)
>
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:606)
>
>   ... 50 elided
>
>
>
> I have check the folder on HDFS and there is a structure
> /test/carbondata/default/test_carbon/ but the folder is empty.
>
>
> I’m pretty sure that I’m missing silly, but I have not been able to find a
> way to insert data in the table.
>
>
>
> On another subject, I’m trying to also access this through presto, but
> here the error is always: Query 20170726_145207_00005_ytsnk failed: line
> 1:1: Schema 'default' does not exist
>
>
>
> I’m also a little bit lost as from Spark it seems that the table are
> created in the hive metastore, but the Presto plugin doesn’t seem to refer
> to it.
>
>
>
> Thanks for reading!
>
>
> AG
>

Mime
View raw message