hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1333: [HUDI-589][DOCS] Fix querying_data page
Date Fri, 14 Feb 2020 02:44:37 GMT
vinothchandar commented on a change in pull request #1333: [HUDI-589][DOCS] Fix querying_data
page
URL: https://github.com/apache/incubator-hudi/pull/1333#discussion_r379225479
 
 

 ##########
 File path: docs/_docs/2_3_querying_data.md
 ##########
 @@ -84,55 +102,53 @@ using the hive session property for incremental queries: `set hive.fetch.task.co
 would ensure Map Reduce execution is chosen for a Hive query, which combines partitions (comma
 separated) and calls InputFormat.listStatus() only once with all those partitions.
 
-## Spark
+## Spark datasource
 
-Spark provides much easier deployment & management of Hudi jars and bundles into jobs/notebooks.
At a high level, there are two ways to access Hudi tables in Spark.
+Hudi COPY_ON_WRITE tables can be queried via Spark datasource similar to how standard datasources
work (e.g: `spark.read.parquet`). 
+Both snapshot querying and incremental querying are supported here. Typically spark jobs
require adding `--jars <path to jar>/hudi-spark-bundle_2.11:0.5.1-incubating`
+to classpath of drivers and executors. Refer [building Hudi](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source)
for build instructions. 
 
 Review comment:
   can we remove this line ` Refer [building Hudi](https://github.com/apache/incubator-hudi#building-apache-hudi-from-source)
for build instructions. ` .. you don't have to build it yourself per se.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message