beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ieme...@apache.org
Subject [1/3] beam-site git commit: Add HadoopInputFormatIO example to read from Hive's HCatalog
Date Tue, 06 Jun 2017 07:38:47 GMT
Repository: beam-site
Updated Branches:
  refs/heads/asf-site 369b331db -> 07a32b382


Add HadoopInputFormatIO example to read from Hive's HCatalog


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8ff65fe7
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8ff65fe7
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8ff65fe7

Branch: refs/heads/asf-site
Commit: 8ff65fe7198f43d3653c2bb539f22192459b83ec
Parents: 369b331
Author: Seshadri Chakkravarthy <sesh.cr@gmail.com>
Authored: Tue May 23 16:15:26 2017 -0700
Committer: Ismaël Mejía <iemejia@apache.org>
Committed: Tue Jun 6 09:32:12 2017 +0200

----------------------------------------------------------------------
 src/documentation/io/built-in-hadoop.md | 31 ++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/8ff65fe7/src/documentation/io/built-in-hadoop.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/built-in-hadoop.md b/src/documentation/io/built-in-hadoop.md
index 5c07717..722facb 100644
--- a/src/documentation/io/built-in-hadoop.md
+++ b/src/documentation/io/built-in-hadoop.md
@@ -195,3 +195,34 @@ PCollection<KV<Text, LinkedMapWritable>> elasticData = p.apply("read",
 ```
 
 The `org.elasticsearch.hadoop.mr.EsInputFormat`'s `EsInputFormat` key class is `org.apache.hadoop.io.Text`
`Text`, and its value class is `org.elasticsearch.hadoop.mr.LinkedMapWritable` `LinkedMapWritable`.
Both key and value classes have Beam Coders.
+
+### HCatalog - HCatInputFormat
+
+To read data using HCatalog, use `org.apache.hive.hcatalog.mapreduce.HCatInputFormat`, which
needs the following properties to be set:
+
+```java
+Configuration hcatConf = new Configuration();
+hcatConf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, InputFormat.class);
+hcatConf.setClass("key.class", LongWritable.class, Object.class);
+hcatConf.setClass("value.class", HCatRecord.class, Object.class);
+hcatConf.set("hive.metastore.uris", "thrift://metastore-host:port");
+
+org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(hcatConf, "my_database", "my_table",
"my_filter");
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+Call Read transform as follows:
+
+```java
+PCollection<KV<Long, HCatRecord>> hcatData =
+  p.apply("read",
+  HadoopInputFormatIO.<Long, HCatRecord>read()
+  .withConfiguration(hcatConf);
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
\ No newline at end of file


Mime
View raw message