hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <>
Subject Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS
Date Thu, 28 May 2015 00:44:25 GMT
hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , there are about
300+ hive tables.The data is stored an text (moving slowly to Parquet) on HDFS.I want to use
SparkSQL and point to the Hive metadata and be able to define JOINS etc using a programming
structure like thisĀ 
import org.apache.spark.sql.hive.HiveContextval sqlContext = new HiveContext(sc)val schemaRdd
= sqlContext.sql("some complex SQL")

Is that the way to go ? Some guidance will be great.

View raw message