crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject part files in wordcount example
Date Thu, 19 Feb 2015 10:22:52 GMT
Hi

I am trying to understand wordcount example

*public class WordCount {*
* public static void main(String[] args) throws Exception {*
* String source = args[0];*
* String dest = args[1];*
* Configuration conf = new Configuration();*
* FileSystem fs = FileSystem.get(conf);*
* if (fs.exists(new Path(dest))) {*
* fs.delete(new Path(dest), true);*
* }*
* Pipeline pipeline = new MRPipeline(WordCount.class);    *
* PCollection<String> lines = pipeline.readTextFile(source);*

* PCollection<String> words = lines.parallelDo("my splitter", new
DoFn<String, String>() {*
* @Override*
* public void process(String line, Emitter<String> emitter) {*

* // TODO Auto-generated method stub*
* for (String word : line.split("\\s+")) {*
* emitter.emit(word);*

* }*
* }*
* }, Writables.strings());*
* PTable<String, Long> counts = Aggregate.count(words);*
* pipeline.writeTextFile(counts, dest);*
* pipeline.run();*
* }*

*}*

1. Once I ran this under 1.8 GB text file , I am getting 2 part files as
output. so it means that this program ran under 2 reducers. where is it
specified? Or is it done automaticaly?
2. DoFn() is similar to mapper ,reducer,combiner in mapper we are only
emitting the word. But in Mapreduce we are emitting word,1. How is this
aggregate done.
3. Where can I find good tutorials?

-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Mime
View raw message