beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ieme...@apache.org
Subject [1/3] beam-site git commit: Add Amazon DynamoDB example using HadoopInputFormatIO
Date Tue, 27 Jun 2017 10:00:10 GMT
Repository: beam-site
Updated Branches:
  refs/heads/asf-site 3ab9c27eb -> 855364b8b


Add Amazon DynamoDB example using HadoopInputFormatIO


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/920a0be8
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/920a0be8
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/920a0be8

Branch: refs/heads/asf-site
Commit: 920a0be82cc96d621435e3d320552d0799804e3d
Parents: 3ab9c27
Author: Seshadri Chakkravarthy <sesh.cr@gmail.com>
Authored: Fri Jun 23 09:37:39 2017 -0700
Committer: Ismaël Mejía <iemejia@gmail.com>
Committed: Tue Jun 27 11:56:25 2017 +0200

----------------------------------------------------------------------
 src/documentation/io/built-in-hadoop.md | 43 ++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/920a0be8/src/documentation/io/built-in-hadoop.md
----------------------------------------------------------------------
diff --git a/src/documentation/io/built-in-hadoop.md b/src/documentation/io/built-in-hadoop.md
index 722facb..240d919 100644
--- a/src/documentation/io/built-in-hadoop.md
+++ b/src/documentation/io/built-in-hadoop.md
@@ -225,4 +225,47 @@ PCollection<KV<Long, HCatRecord>> hcatData =
 
 ```py
   # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+### Amazon DynamoDB - DynamoDBInputFormat
+
+To read data from Amazon DynamoDB, use `org.apache.hadoop.dynamodb.read.DynamoDBInputFormat`.
+DynamoDBInputFormat implements the older `org.apache.hadoop.mapred.InputFormat` interface
and to make it compatible with HadoopInputFormatIO which uses the newer abstract class `org.apache.hadoop.mapreduce.InputFormat`,

+a wrapper API is required which acts as an adapter between HadoopInputFormatIO and DynamoDBInputFormat
(or in general any InputFormat implementing `org.apache.hadoop.mapred.InputFormat`)
+The below example uses one such available wrapper API - <https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java>
+
+
+```java
+Configuration dynamoDBConf = new Configuration();
+Job job = Job.getInstance(dynamoDBConf);
+com.twitter.elephantbird.mapreduce.input.MapReduceInputFormatWrapper.setInputFormat(org.apache.hadoop.dynamodb.read.DynamoDBInputFormat.class,
job);
+dynamoDBConf = job.getConfiguration();
+dynamoDBConf.setClass("key.class", Text.class, WritableComparable.class);
+dynamoDBConf.setClass("value.class", org.apache.hadoop.dynamodb.DynamoDBItemWritable.class,
Writable.class);
+dynamoDBConf.set("dynamodb.servicename", "dynamodb");
+dynamoDBConf.set("dynamodb.input.tableName", "table_name");
+dynamoDBConf.set("dynamodb.endpoint", "dynamodb.us-west-1.amazonaws.com");
+dynamoDBConf.set("dynamodb.regionid", "us-west-1");
+dynamoDBConf.set("dynamodb.throughput.read", "1");
+dynamoDBConf.set("dynamodb.throughput.read.percent", "1");
+dynamoDBConf.set("dynamodb.version", "2011-12-05");
+dynamoDBConf.set(DynamoDBConstants.DYNAMODB_ACCESS_KEY_CONF, "aws_access_key");
+dynamoDBConf.set(DynamoDBConstants.DYNAMODB_SECRET_KEY_CONF, "aws_secret_key");
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
+```
+
+Call Read transform as follows:
+
+```java
+PCollection<Text, DynamoDBItemWritable> dynamoDBData =
+  p.apply("read",
+  HadoopInputFormatIO.<Text, DynamoDBItemWritable>read()
+  .withConfiguration(dynamoDBConf);
+```
+
+```py
+  # The Beam SDK for Python does not support Hadoop InputFormat IO.
 ```
\ No newline at end of file


Mime
View raw message