hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramasubramanian Narayanan <ramasubramanian.naraya...@gmail.com>
Subject Re: Please help on providing correct answers
Date Wed, 07 Nov 2012 18:54:21 GMT
Hi,

Have given my explanation for choosing and why I am saying given answer is
wrong...

You are running a job that will process a single InputSplit on a cluster
which has no other jobs
currently running. Each node has an equal number of open Map slots. On
which node will Hadoop
first attempt to run the Map task?
A. The node with the most memory
B. The node with the lowest system load
C. The node on which this InputSplit is stored
D. The node with the most free local disk space

My Answer            : C [Mapper will run on the data nodes where it has
the data. So it will run the map task on the node in which the InputSplit
is stored.]
Answer Given in site : A [I Hope the Map task will go and check the nodes
which has the most memory]
*******************************************************************************
What is a Writable?
A. Writable is an interface that all keys and values in MapReduce must
implement. Classes implementing this interface must implement methods
forserializingand deserializing themselves.
B. Writable is an abstract class that all keys and values in MapReduce must
extend. Classes extending this abstract base class must implementmethods
for serializing and deserializingthemselves
C. Writable is an interface that all keys, but not values, in MapReduce
must implement. Classes implementing this interface mustimplementmethods
for serializing and deserializing themselves.
D. Writable is an abstract class that all keys, but not values, in
MapReduce must extend. Classes extending this abstract base class must
implementmethods for serializing and deserializing themselves.

My Answer            : A [Writable is an interface]
Answer Given in site : B [Writable is not abstract class]
******************************************************************************

You write a MapReduce job to process 100 files in HDFS. Your MapReducc
algorithm uses
TextInputFormat and the IdentityReducer: the mapper applies a regular
expression over input
values and emits key-value pairs with the key consisting of the matching
text, and the value
containing the filename and byte offset. Determine the difference between
setting the number of
reducers to zero.
A. There is no differenceinoutput between the two settings.
B. With zero reducers, no reducer runs and the job throws an exception.
With one reducer,
instances of matching patterns are stored in a single file on HDFS.
C. With zero reducers, all instances of matching patterns are gathered
together in one file on
HDFS. With one reducer, instances ofmatching patternsstored in multiple
files on HDFS.
D. With zero reducers, instances of matching patterns are stored in
multiple files on HDFS. With
one reducer, all instances of matching patterns aregathered together in one
file on HDFS.

My Answer            : D [With No reducers all the output of Mappers will
be directly written to HDFS. So mutiple files will be created]
Answer Given in site : C [If you have one reducer then you will get one
output file only not many]

*******************************************************************************

During the standard sort and shuffle phase of MapReduce, keys and values
are passed to
reducers. Which of the following is true?
A. Keys are presented to a reducerin sorted order; values foragiven key are
not sorted.
B. Keys are presented to a reducer in soiled order; values for a given key
are sorted in ascending
order.
C. Keys are presented to a reducer in random order; values for a given key
are not sorted.
D. Keys are presented to a reducer in random order; values for a given key
are sorted in
ascending order.

My Answer            : A [For Reducer, Keys will be passed on Sorted order
not Value. To get the value in sorted order we need to use secondary sort]
Answer Given in site : D [For Reducer, Keys will be passsed only in sorted
order not in random order]

*******************************************************************************

Which statement best describes the data path of intermediate key-value
pairs (i.e., output of the
mappers)?
A. Intermediate key-value pairs are written to HDFS. Reducers read the
intermediate data from
HDFS.
B. Intermediate key-value pairs are written to HDFS. Reducers copy the
intermediate data to the
local disks of the machines runningthe reduce tasks.
C. Intermediate key-value pairs are written to the local disks of the
machines running the map
tasks, and then copied to the machinerunning thereduce tasks.
D. Intermediate key-value pairs are written to the local disks of the
machines running the map
tasks, and are then copied to HDFS. Reducers read theintermediate data from
HDFS.

My Answer            : C [Intermediate key-value pairs are written in local
disk and transferred to network for reducer. Once the job is completed the
intermediate data will be deleted on the data node]
Answer Given in site : B [Intermediate key-values will not be written to
HDFS and reducer will not read from HDFS]

*******************************************************************************

You are developing a combiner that takes as input Text keys, IntWritable
values, and emits Text
keys, Intwritable values. Which interface should your class implement?
A. Mapper <Text, IntWritable, Text, IntWritable>
B. Reducer <Text, Text, IntWritable, IntWritable>
C. Reducer <Text, IntWritable, Text, IntWritable>
D. Combiner <Text, IntWritable, Text, IntWritable>
E. Combiner <Text, Text, IntWritable, IntWritable>

My Answer            : D [For developing combiner we need to use the
combiner method]
Answer Given in site : C [For developing combiner we need to use the
combiner method only not Reducer method]

*******************************************************************************

What happens in a MapReduce job when you set the number of reducers to one?
A. A single reducer gathers and processes all the output from all the
mappers. The output is
written in as many separate files as there are mappers.
B. A single reducer gathers and processes all the output from all the
mappers. The output is
written to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck,
and since the number
of reducers as specified by the programmer is used as areference value
only, the MapReduce
runtime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is
thrown

My Answer            : B [With a single reducer, the output will be a
single HDFS File]
Answer Given in site : C [Number of reducer specified by program is not for
reference only...also MapReduce runtime doesn't provide a default setting
for the number of reducers]

*******************************************************************************

In the standard word count MapReduce algorithm, why might using a combiner
reduce the overall
Job running time?
A. Because combiners perform local aggregation of word counts, thereby
allowing the mappers to
process input data faster.
B. Because combiners perform local aggregation of word counts, thereby
reducing the number of
mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then
transfer that data to
reducers without writing the intermediatedata to disk.
D. Because combiners perform local aggregation of word counts, thereby
reducing the number of
key-value pairs that need to be snuff letacross thenetwork to the reducers.

My Answer            : C [Combiner is almost like a reducer. But it
operates with the local node with the output of mapper]
Answer Given in site : A [Combiner will take the output of Mapper and hence
it is no way related to make the mapper to work faster]

*******************************************************************************

You need to create a GUI application to help your company's sales people
add and edit customer
information. Would HDFS be appropriate for this customer information file?
A. Yes, because HDFS isoptimized forrandom access writes.
B. Yes, because HDFS is optimized for fast retrieval of relatively small
amounts of data.
C. No, becauseHDFS can only be accessed by MapReduce applications.
D. No, because HDFS is optimized for write-once, streaming access for
relatively large files.

My Answer            : D [HDFS is for write-once]
Answer Given in site : A [HDFS is not for random access writes]

*******************************************************************************

You need to create a job that does frequency analysis on input data. You
will do this by writing a
Mapper that uses TextInputForma and splits each value (a line of text from
an input file) into
individual characters. For each one of these characters, you will emit the
character as a key and
as IntWritable as the value. Since this will produce proportionally more
intermediate data than
input data, which resources could you expect to be likely bottlenecks?
A. Processor and RAM
B. Processor and disk I/O
C. Disk I/O and network I/O
D. Processor and network I/O

My Answer            : D [Its my guess not sure whether I am right
here..Since the output of Mapper will go into network I choose D]
Answer Given in site : B [Not sure to comment on this]

*******************************************************************************

Which of the following statements best describes how a large (100 GB) file
is stored in HDFS?
A. The file is divided into variable size blocks, which are stored on
multiple data nodes. Each block
is replicated three timesby default.
B. The file is replicated three times by default. Each ropy of the file is
stored on a separate
datanodes.
C. The master copy of the file is stored on a single datanode. The replica
copies are divided into
fixed-size blocks, which are stored on multiple datanodes.
D. The file is divided into fixed-size blocks, which are stored on multiple
datanodes.Eachblock is
replicated three times by default. Multiple blocks from the same file
mightreside on the same
datanode.
E. The tile is divided into fixed-sizeblocks, which are stored on multiple
datanodes.Eachblock is
replicated three times by default.HDES guarantees that different blocks
from the same file are
never on the same datanode.

My Answer            : D [Block size is fixed either 64MB/128MB. Stored in
multiple nodes and also multiple block from the same file might reside on
the same datanode]
Answer Given in site : B [Each copy of file is not stored on a separate
datanode. Only the blocks will be stored]

*******************************************************************************

regards,
Rams

On Wed, Nov 7, 2012 at 11:52 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi,
>
> I'd instead like you to explain why you think someone's proposed
> answer (who?) is wrong and why yours is correct. You learn more that
> way than us head nodding/shaking to things you ask.
>
> On Wed, Nov 7, 2012 at 10:51 PM, Ramasubramanian Narayanan
> <ramasubramanian.narayanan@gmail.com> wrote:
> > Hi,
> >
> >    I came across the following question in some sites and the answer that
> > they provided seems to be wrong according to me... I might be wrong...
> Can
> > some one help on confirming the right answers for these 11 questions
> pls..
> > appreciate the explanation if you could able to provide...
> >
> >
> *******************************************************************************
> > You are running a job that will process a single InputSplit on a cluster
> > which has no other jobs
> > currently running. Each node has an equal number of open Map slots. On
> which
> > node will Hadoop
> > first attempt to run the Map task?
> > A. The node with the most memory
> > B. The node with the lowest system load
> > C. The node on which this InputSplit is stored
> > D. The node with the most free local disk space
> >
> > My Answer            : C
> > Answer Given in site : A
> >
> >
> *******************************************************************************
> > What is a Writable?
> > A. Writable is an interface that all keys and values in MapReduce must
> > implement. Classes implementing this interface must implement methods
> > forserializingand deserializing themselves.
> > B. Writable is an abstract class that all keys and values in MapReduce
> must
> > extend. Classes extending this abstract base class must implementmethods
> for
> > serializing and deserializingthemselves
> > C. Writable is an interface that all keys, but not values, in MapReduce
> must
> > implement. Classes implementing this interface mustimplementmethods for
> > serializing and deserializing themselves.
> > D. Writable is an abstract class that all keys, but not values, in
> MapReduce
> > must extend. Classes extending this abstract base class must
> > implementmethods for serializing and deserializing themselves.
> >
> > My Answer            : A
> > Answer Given in site : B
> >
> >
> *******************************************************************************
> >
> > You write a MapReduce job to process 100 files in HDFS. Your MapReducc
> > algorithm uses
> > TextInputFormat and the IdentityReducer: the mapper applies a regular
> > expression over input
> > values and emits key-value pairs with the key consisting of the matching
> > text, and the value
> > containing the filename and byte offset. Determine the difference between
> > setting the number of
> > reducers to zero.
> > A. There is no differenceinoutput between the two settings.
> > B. With zero reducers, no reducer runs and the job throws an exception.
> With
> > one reducer,
> > instances of matching patterns are stored in a single file on HDFS.
> > C. With zero reducers, all instances of matching patterns are gathered
> > together in one file on
> > HDFS. With one reducer, instances ofmatching patternsstored in multiple
> > files on HDFS.
> > D. With zero reducers, instances of matching patterns are stored in
> multiple
> > files on HDFS. With
> > one reducer, all instances of matching patterns aregathered together in
> one
> > file on HDFS.
> >
> > My Answer            : D
> > Answer Given in site : C
> >
> >
> *******************************************************************************
> >
> > During the standard sort and shuffle phase of MapReduce, keys and values
> are
> > passed to
> > reducers. Which of the following is true?
> > A. Keys are presented to a reducerin sorted order; values foragiven key
> are
> > not sorted.
> > B. Keys are presented to a reducer in soiled order; values for a given
> key
> > are sorted in ascending
> > order.
> > C. Keys are presented to a reducer in random order; values for a given
> key
> > are not sorted.
> > D. Keys are presented to a reducer in random order; values for a given
> key
> > are sorted in
> > ascending order.
> >
> > My Answer            : A
> > Answer Given in site : D
> >
> >
> *******************************************************************************
> >
> > Which statement best describes the data path of intermediate key-value
> pairs
> > (i.e., output of the
> > mappers)?
> > A. Intermediate key-value pairs are written to HDFS. Reducers read the
> > intermediate data from
> > HDFS.
> > B. Intermediate key-value pairs are written to HDFS. Reducers copy the
> > intermediate data to the
> > local disks of the machines runningthe reduce tasks.
> > C. Intermediate key-value pairs are written to the local disks of the
> > machines running the map
> > tasks, and then copied to the machinerunning thereduce tasks.
> > D. Intermediate key-value pairs are written to the local disks of the
> > machines running the map
> > tasks, and are then copied to HDFS. Reducers read theintermediate data
> from
> > HDFS.
> >
> > My Answer            : C
> > Answer Given in site : B
> >
> >
> *******************************************************************************
> >
> > You are developing a combiner that takes as input Text keys, IntWritable
> > values, and emits Text
> > keys, Intwritable values. Which interface should your class implement?
> > A. Mapper <Text, IntWritable, Text, IntWritable>
> > B. Reducer <Text, Text, IntWritable, IntWritable>
> > C. Reducer <Text, IntWritable, Text, IntWritable>
> > D. Combiner <Text, IntWritable, Text, IntWritable>
> > E. Combiner <Text, Text, IntWritable, IntWritable>
> >
> > My Answer            : D
> > Answer Given in site : C
> >
> >
> *******************************************************************************
> >
> > What happens in a MapReduce job when you set the number of reducers to
> one?
> > A. A single reducer gathers and processes all the output from all the
> > mappers. The output is
> > written in as many separate files as there are mappers.
> > B. A single reducer gathers and processes all the output from all the
> > mappers. The output is
> > written to a single file in HDFS.
> > C. Setting the number of reducers to one creates a processing bottleneck,
> > and since the number
> > of reducers as specified by the programmer is used as areference value
> only,
> > the MapReduce
> > runtime provides a default setting for the number of reducers.
> > D. Setting the number of reducers to one is invalid, and an exception is
> > thrown
> >
> > My Answer            : B
> > Answer Given in site : C
> >
> >
> *******************************************************************************
> >
> > In the standard word count MapReduce algorithm, why might using a
> combiner
> > reduce the overall
> > Job running time?
> > A. Because combiners perform local aggregation of word counts, thereby
> > allowing the mappers to
> > process input data faster.
> > B. Because combiners perform local aggregation of word counts, thereby
> > reducing the number of
> > mappers that need to run.
> > C. Because combiners perform local aggregation of word counts, and then
> > transfer that data to
> > reducers without writing the intermediatedata to disk.
> > D. Because combiners perform local aggregation of word counts, thereby
> > reducing the number of
> > key-value pairs that need to be snuff letacross thenetwork to the
> reducers.
> >
> > My Answer            : C
> > Answer Given in site : A
> >
> >
> *******************************************************************************
> >
> > You need to create a GUI application to help your company's sales people
> add
> > and edit customer
> > information. Would HDFS be appropriate for this customer information
> file?
> > A. Yes, because HDFS isoptimized forrandom access writes.
> > B. Yes, because HDFS is optimized for fast retrieval of relatively small
> > amounts of data.
> > C. No, becauseHDFS can only be accessed by MapReduce applications.
> > D. No, because HDFS is optimized for write-once, streaming access for
> > relatively large files.
> >
> > My Answer            : D
> > Answer Given in site : A
> >
> >
> *******************************************************************************
> >
> > You need to create a job that does frequency analysis on input data. You
> > will do this by writing a
> > Mapper that uses TextInputForma and splits each value (a line of text
> from
> > an input file) into
> > individual characters. For each one of these characters, you will emit
> the
> > character as a key and
> > as IntWritable as the value. Since this will produce proportionally more
> > intermediate data than
> > input data, which resources could you expect to be likely bottlenecks?
> > A. Processor and RAM
> > B. Processor and disk I/O
> > C. Disk I/O and network I/O
> > D. Processor and network I/O
> >
> > My Answer            : D
> > Answer Given in site : B
> >
> >
> *******************************************************************************
> >
> > Which of the following statements best describes how a large (100 GB)
> file
> > is stored in HDFS?
> > A. The file is divided into variable size blocks, which are stored on
> > multiple data nodes. Each block
> > is replicated three timesby default.
> > B. The file is replicated three times by default. Each ropy of the file
> is
> > stored on a separate
> > datanodes.
> > C. The master copy of the file is stored on a single datanode. The
> replica
> > copies are divided into
> > fixed-size blocks, which are stored on multiple datanodes.
> > D. The file is divided into fixed-size blocks, which are stored on
> multiple
> > datanodes.Eachblock is
> > replicated three times by default. Multiple blocks from the same file
> > mightreside on the same
> > datanode.
> > E. The tile is divided into fixed-sizeblocks, which are stored on
> multiple
> > datanodes.Eachblock is
> > replicated three times by default.HDES guarantees that different blocks
> from
> > the same file are
> > never on the same datanode.
> >
> > My Answer            : D
> > Answer Given in site : B
> >
> >
> *******************************************************************************
> >
> > regards,
> > Rams
>
>
>
> --
> Harsh J
>

Mime
View raw message