hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Please help on providing correct answers
Date Wed, 07 Nov 2012 18:37:26 GMT
Sorry, I think I had better explain why I am curious... 

First, there are a couple of sites that have study questions to help pass Cloudera's certification.

( I don't know if Hortonworks has cert tests, but both MapR and Cloudera do.) 

Its just looking first at the questions... not really good questions and selection of answers.
 Then the 'correct' answer. 

I can understand if you don't want to reveal your sources publicly, but you have to understand
that misinformation found in these sites makes it harder to teach the right answers. 

As Harsh says, you should be able to look at the questions and then go back to Tom White's
book and others to verify why you think your answer is right. 

HTH

-Mike

On Nov 7, 2012, at 11:30 AM, Ramasubramanian Narayanan <ramasubramanian.narayanan@gmail.com>
wrote:

> nothing as consolidated...... I am collecting for the past 1 month... few as printout
and few from mails and few from googling and few from sites and few from some of my friends...
> 
> regards,
> Rams
> 
> On Wed, Nov 7, 2012 at 10:57 PM, Michael Segel <michael_segel@hotmail.com> wrote:
> Ok...
> Where are you pulling these questions from?
> 
> Seriously.
> 
> 
> On Nov 7, 2012, at 11:21 AM, Ramasubramanian Narayanan <ramasubramanian.narayanan@gmail.com>
wrote:
> 
> > Hi,
> >
> >    I came across the following question in some sites and the answer that they provided
seems to be wrong according to me... I might be wrong... Can some one help on confirming the
right answers for these 11 questions pls.. appreciate the explanation if you could able to
provide...
> >
> > *******************************************************************************
> > You are running a job that will process a single InputSplit on a cluster which has
no other jobs
> > currently running. Each node has an equal number of open Map slots. On which node
will Hadoop
> > first attempt to run the Map task?
> > A. The node with the most memory
> > B. The node with the lowest system load
> > C. The node on which this InputSplit is stored
> > D. The node with the most free local disk space
> >
> > My Answer            : C
> > Answer Given in site : A
> >
> > *******************************************************************************
> > What is a Writable?
> > A. Writable is an interface that all keys and values in MapReduce must implement.
Classes implementing this interface must implement methods forserializingand deserializing
themselves.
> > B. Writable is an abstract class that all keys and values in MapReduce must extend.
Classes extending this abstract base class must implementmethods for serializing and deserializingthemselves
> > C. Writable is an interface that all keys, but not values, in MapReduce must implement.
Classes implementing this interface mustimplementmethods for serializing and deserializing
themselves.
> > D. Writable is an abstract class that all keys, but not values, in MapReduce must
extend. Classes extending this abstract base class must implementmethods for serializing and
deserializing themselves.
> >
> > My Answer            : A
> > Answer Given in site : B
> >
> > *******************************************************************************
> >
> > You write a MapReduce job to process 100 files in HDFS. Your MapReducc algorithm
uses
> > TextInputFormat and the IdentityReducer: the mapper applies a regular expression
over input
> > values and emits key-value pairs with the key consisting of the matching text, and
the value
> > containing the filename and byte offset. Determine the difference between setting
the number of
> > reducers to zero.
> > A. There is no differenceinoutput between the two settings.
> > B. With zero reducers, no reducer runs and the job throws an exception. With one
reducer,
> > instances of matching patterns are stored in a single file on HDFS.
> > C. With zero reducers, all instances of matching patterns are gathered together
in one file on
> > HDFS. With one reducer, instances ofmatching patternsstored in multiple files on
HDFS.
> > D. With zero reducers, instances of matching patterns are stored in multiple files
on HDFS. With
> > one reducer, all instances of matching patterns aregathered together in one file
on HDFS.
> >
> > My Answer            : D
> > Answer Given in site : C
> >
> > *******************************************************************************
> >
> > During the standard sort and shuffle phase of MapReduce, keys and values are passed
to
> > reducers. Which of the following is true?
> > A. Keys are presented to a reducerin sorted order; values foragiven key are not
sorted.
> > B. Keys are presented to a reducer in soiled order; values for a given key are sorted
in ascending
> > order.
> > C. Keys are presented to a reducer in random order; values for a given key are not
sorted.
> > D. Keys are presented to a reducer in random order; values for a given key are sorted
in
> > ascending order.
> >
> > My Answer            : A
> > Answer Given in site : D
> >
> > *******************************************************************************
> >
> > Which statement best describes the data path of intermediate key-value pairs (i.e.,
output of the
> > mappers)?
> > A. Intermediate key-value pairs are written to HDFS. Reducers read the intermediate
data from
> > HDFS.
> > B. Intermediate key-value pairs are written to HDFS. Reducers copy the intermediate
data to the
> > local disks of the machines runningthe reduce tasks.
> > C. Intermediate key-value pairs are written to the local disks of the machines running
the map
> > tasks, and then copied to the machinerunning thereduce tasks.
> > D. Intermediate key-value pairs are written to the local disks of the machines running
the map
> > tasks, and are then copied to HDFS. Reducers read theintermediate data from HDFS.
> >
> > My Answer            : C
> > Answer Given in site : B
> >
> > *******************************************************************************
> >
> > You are developing a combiner that takes as input Text keys, IntWritable values,
and emits Text
> > keys, Intwritable values. Which interface should your class implement?
> > A. Mapper <Text, IntWritable, Text, IntWritable>
> > B. Reducer <Text, Text, IntWritable, IntWritable>
> > C. Reducer <Text, IntWritable, Text, IntWritable>
> > D. Combiner <Text, IntWritable, Text, IntWritable>
> > E. Combiner <Text, Text, IntWritable, IntWritable>
> >
> > My Answer            : D
> > Answer Given in site : C
> >
> > *******************************************************************************
> >
> > What happens in a MapReduce job when you set the number of reducers to one?
> > A. A single reducer gathers and processes all the output from all the mappers. The
output is
> > written in as many separate files as there are mappers.
> > B. A single reducer gathers and processes all the output from all the mappers. The
output is
> > written to a single file in HDFS.
> > C. Setting the number of reducers to one creates a processing bottleneck, and since
the number
> > of reducers as specified by the programmer is used as areference value only, the
MapReduce
> > runtime provides a default setting for the number of reducers.
> > D. Setting the number of reducers to one is invalid, and an exception is thrown
> >
> > My Answer            : B
> > Answer Given in site : C
> >
> > *******************************************************************************
> >
> > In the standard word count MapReduce algorithm, why might using a combiner reduce
the overall
> > Job running time?
> > A. Because combiners perform local aggregation of word counts, thereby allowing
the mappers to
> > process input data faster.
> > B. Because combiners perform local aggregation of word counts, thereby reducing
the number of
> > mappers that need to run.
> > C. Because combiners perform local aggregation of word counts, and then transfer
that data to
> > reducers without writing the intermediatedata to disk.
> > D. Because combiners perform local aggregation of word counts, thereby reducing
the number of
> > key-value pairs that need to be snuff letacross thenetwork to the reducers.
> >
> > My Answer            : C
> > Answer Given in site : A
> >
> > *******************************************************************************
> >
> > You need to create a GUI application to help your company's sales people add and
edit customer
> > information. Would HDFS be appropriate for this customer information file?
> > A. Yes, because HDFS isoptimized forrandom access writes.
> > B. Yes, because HDFS is optimized for fast retrieval of relatively small amounts
of data.
> > C. No, becauseHDFS can only be accessed by MapReduce applications.
> > D. No, because HDFS is optimized for write-once, streaming access for relatively
large files.
> >
> > My Answer            : D
> > Answer Given in site : A
> >
> > *******************************************************************************
> >
> > You need to create a job that does frequency analysis on input data. You will do
this by writing a
> > Mapper that uses TextInputForma and splits each value (a line of text from an input
file) into
> > individual characters. For each one of these characters, you will emit the character
as a key and
> > as IntWritable as the value. Since this will produce proportionally more intermediate
data than
> > input data, which resources could you expect to be likely bottlenecks?
> > A. Processor and RAM
> > B. Processor and disk I/O
> > C. Disk I/O and network I/O
> > D. Processor and network I/O
> >
> > My Answer            : D
> > Answer Given in site : B
> >
> > *******************************************************************************
> >
> > Which of the following statements best describes how a large (100 GB) file is stored
in HDFS?
> > A. The file is divided into variable size blocks, which are stored on multiple data
nodes. Each block
> > is replicated three timesby default.
> > B. The file is replicated three times by default. Each ropy of the file is stored
on a separate
> > datanodes.
> > C. The master copy of the file is stored on a single datanode. The replica copies
are divided into
> > fixed-size blocks, which are stored on multiple datanodes.
> > D. The file is divided into fixed-size blocks, which are stored on multiple datanodes.Eachblock
is
> > replicated three times by default. Multiple blocks from the same file mightreside
on the same
> > datanode.
> > E. The tile is divided into fixed-sizeblocks, which are stored on multiple datanodes.Eachblock
is
> > replicated three times by default.HDES guarantees that different blocks from the
same file are
> > never on the same datanode.
> >
> > My Answer            : D
> > Answer Given in site : B
> >
> > *******************************************************************************
> >
> > regards,
> > Rams
> 
> 


Mime
View raw message