hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjs...@apache.org
Subject [31/50] [abbrv] hadoop git commit: MAPREDUCE-579. Streaming slowmatch documentation.
Date Fri, 27 Mar 2015 06:34:33 GMT
MAPREDUCE-579. Streaming slowmatch documentation.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/b51b3662
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/b51b3662
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/b51b3662

Branch: refs/heads/YARN-2928
Commit: b51b36626316b1cba2bf5c991473b2e26c29685c
Parents: 89760de
Author: Harsh J <harsh@cloudera.com>
Authored: Wed Mar 25 14:38:12 2015 +0530
Committer: Zhijie Shen <zjshen@apache.org>
Committed: Thu Mar 26 23:29:47 2015 -0700

----------------------------------------------------------------------
 hadoop-mapreduce-project/CHANGES.txt                          | 2 ++
 .../hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm  | 7 +++++++
 2 files changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/b51b3662/hadoop-mapreduce-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt
index 2b16c30..f81a13f 100644
--- a/hadoop-mapreduce-project/CHANGES.txt
+++ b/hadoop-mapreduce-project/CHANGES.txt
@@ -256,6 +256,8 @@ Release 2.8.0 - UNRELEASED
 
   IMPROVEMENTS
 
+    MAPREDUCE-579. Streaming "slowmatch" documentation. (harsh)
+
     MAPREDUCE-6287. Deprecated methods in org.apache.hadoop.examples.Sort
     (Chao Zhang via harsh)
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/b51b3662/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
index b4c5e38..7f2412e 100644
--- a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
+++ b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
@@ -546,6 +546,13 @@ You can use the record reader StreamXmlRecordReader to process XML documents.
 
 Anything found between BEGIN\_STRING and END\_STRING would be treated as one record for map
tasks.
 
+The name-value properties that StreamXmlRecordReader understands are:
+
+*   (strings) 'begin' - Characters marking beginning of record, and 'end' - Characters marking
end of record.
+*   (boolean) 'slowmatch' - Toggle to look for begin and end characters, but within CDATA
instead of regular tags. Defaults to false.
+*   (integer) 'lookahead' - Maximum lookahead bytes to sync CDATA when using 'slowmatch',
should be larger than 'maxrec'. Defaults to 2*'maxrec'.
+*   (integer) 'maxrec' - Maximum record size to read between each match during 'slowmatch'.
Defaults to 50000 bytes.
+
 $H3 How do I update counters in streaming applications?
 
 A streaming process can use the stderr to emit counter information. `reporter:counter:<group>,<counter>,<amount>`
should be sent to stderr to update the counter.


Mime
View raw message