Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1EFFDD74 for ; Wed, 7 Nov 2012 18:38:06 +0000 (UTC) Received: (qmail 86271 invoked by uid 500); 7 Nov 2012 18:38:01 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 86088 invoked by uid 500); 7 Nov 2012 18:38:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86081 invoked by uid 99); 7 Nov 2012 18:38:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 18:38:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.102 as permitted sender) Received: from [65.55.111.102] (HELO blu0-omc2-s27.blu0.hotmail.com) (65.55.111.102) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 18:37:50 +0000 Received: from BLU0-SMTP270 ([65.55.111.73]) by blu0-omc2-s27.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 7 Nov 2012 10:37:29 -0800 X-Originating-IP: [173.15.87.37] X-EIP: [OOvUy71G6gbARCrWeIGvmbMPM3rMflbT] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [192.168.0.100] ([173.15.87.37]) by BLU0-SMTP270.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Wed, 7 Nov 2012 10:37:27 -0800 From: Michael Segel Content-Type: multipart/alternative; boundary="Apple-Mail=_19EA42DC-E374-4B8B-B171-52D067E944BF" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Please help on providing correct answers Date: Wed, 7 Nov 2012 12:37:26 -0600 References: To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 07 Nov 2012 18:37:27.0966 (UTC) FILETIME=[F02773E0:01CDBD16] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_19EA42DC-E374-4B8B-B171-52D067E944BF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Sorry, I think I had better explain why I am curious...=20 First, there are a couple of sites that have study questions to help = pass Cloudera's certification.=20 ( I don't know if Hortonworks has cert tests, but both MapR and Cloudera = do.)=20 Its just looking first at the questions... not really good questions and = selection of answers. Then the 'correct' answer.=20 I can understand if you don't want to reveal your sources publicly, but = you have to understand that misinformation found in these sites makes it = harder to teach the right answers.=20 As Harsh says, you should be able to look at the questions and then go = back to Tom White's book and others to verify why you think your answer = is right.=20 HTH -Mike On Nov 7, 2012, at 11:30 AM, Ramasubramanian Narayanan = wrote: > nothing as consolidated...... I am collecting for the past 1 month... = few as printout and few from mails and few from googling and few from = sites and few from some of my friends... >=20 > regards, > Rams >=20 > On Wed, Nov 7, 2012 at 10:57 PM, Michael Segel = wrote: > Ok... > Where are you pulling these questions from? >=20 > Seriously. >=20 >=20 > On Nov 7, 2012, at 11:21 AM, Ramasubramanian Narayanan = wrote: >=20 > > Hi, > > > > I came across the following question in some sites and the answer = that they provided seems to be wrong according to me... I might be = wrong... Can some one help on confirming the right answers for these 11 = questions pls.. appreciate the explanation if you could able to = provide... > > > > = **************************************************************************= ***** > > You are running a job that will process a single InputSplit on a = cluster which has no other jobs > > currently running. Each node has an equal number of open Map slots. = On which node will Hadoop > > first attempt to run the Map task? > > A. The node with the most memory > > B. The node with the lowest system load > > C. The node on which this InputSplit is stored > > D. The node with the most free local disk space > > > > My Answer : C > > Answer Given in site : A > > > > = **************************************************************************= ***** > > What is a Writable? > > A. Writable is an interface that all keys and values in MapReduce = must implement. Classes implementing this interface must implement = methods forserializingand deserializing themselves. > > B. Writable is an abstract class that all keys and values in = MapReduce must extend. Classes extending this abstract base class must = implementmethods for serializing and deserializingthemselves > > C. Writable is an interface that all keys, but not values, in = MapReduce must implement. Classes implementing this interface = mustimplementmethods for serializing and deserializing themselves. > > D. Writable is an abstract class that all keys, but not values, in = MapReduce must extend. Classes extending this abstract base class must = implementmethods for serializing and deserializing themselves. > > > > My Answer : A > > Answer Given in site : B > > > > = **************************************************************************= ***** > > > > You write a MapReduce job to process 100 files in HDFS. Your = MapReducc algorithm uses > > TextInputFormat and the IdentityReducer: the mapper applies a = regular expression over input > > values and emits key-value pairs with the key consisting of the = matching text, and the value > > containing the filename and byte offset. Determine the difference = between setting the number of > > reducers to zero. > > A. There is no differenceinoutput between the two settings. > > B. With zero reducers, no reducer runs and the job throws an = exception. With one reducer, > > instances of matching patterns are stored in a single file on HDFS. > > C. With zero reducers, all instances of matching patterns are = gathered together in one file on > > HDFS. With one reducer, instances ofmatching patternsstored in = multiple files on HDFS. > > D. With zero reducers, instances of matching patterns are stored in = multiple files on HDFS. With > > one reducer, all instances of matching patterns aregathered together = in one file on HDFS. > > > > My Answer : D > > Answer Given in site : C > > > > = **************************************************************************= ***** > > > > During the standard sort and shuffle phase of MapReduce, keys and = values are passed to > > reducers. Which of the following is true? > > A. Keys are presented to a reducerin sorted order; values foragiven = key are not sorted. > > B. Keys are presented to a reducer in soiled order; values for a = given key are sorted in ascending > > order. > > C. Keys are presented to a reducer in random order; values for a = given key are not sorted. > > D. Keys are presented to a reducer in random order; values for a = given key are sorted in > > ascending order. > > > > My Answer : A > > Answer Given in site : D > > > > = **************************************************************************= ***** > > > > Which statement best describes the data path of intermediate = key-value pairs (i.e., output of the > > mappers)? > > A. Intermediate key-value pairs are written to HDFS. Reducers read = the intermediate data from > > HDFS. > > B. Intermediate key-value pairs are written to HDFS. Reducers copy = the intermediate data to the > > local disks of the machines runningthe reduce tasks. > > C. Intermediate key-value pairs are written to the local disks of = the machines running the map > > tasks, and then copied to the machinerunning thereduce tasks. > > D. Intermediate key-value pairs are written to the local disks of = the machines running the map > > tasks, and are then copied to HDFS. Reducers read theintermediate = data from HDFS. > > > > My Answer : C > > Answer Given in site : B > > > > = **************************************************************************= ***** > > > > You are developing a combiner that takes as input Text keys, = IntWritable values, and emits Text > > keys, Intwritable values. Which interface should your class = implement? > > A. Mapper > > B. Reducer > > C. Reducer > > D. Combiner > > E. Combiner > > > > My Answer : D > > Answer Given in site : C > > > > = **************************************************************************= ***** > > > > What happens in a MapReduce job when you set the number of reducers = to one? > > A. A single reducer gathers and processes all the output from all = the mappers. The output is > > written in as many separate files as there are mappers. > > B. A single reducer gathers and processes all the output from all = the mappers. The output is > > written to a single file in HDFS. > > C. Setting the number of reducers to one creates a processing = bottleneck, and since the number > > of reducers as specified by the programmer is used as areference = value only, the MapReduce > > runtime provides a default setting for the number of reducers. > > D. Setting the number of reducers to one is invalid, and an = exception is thrown > > > > My Answer : B > > Answer Given in site : C > > > > = **************************************************************************= ***** > > > > In the standard word count MapReduce algorithm, why might using a = combiner reduce the overall > > Job running time? > > A. Because combiners perform local aggregation of word counts, = thereby allowing the mappers to > > process input data faster. > > B. Because combiners perform local aggregation of word counts, = thereby reducing the number of > > mappers that need to run. > > C. Because combiners perform local aggregation of word counts, and = then transfer that data to > > reducers without writing the intermediatedata to disk. > > D. Because combiners perform local aggregation of word counts, = thereby reducing the number of > > key-value pairs that need to be snuff letacross thenetwork to the = reducers. > > > > My Answer : C > > Answer Given in site : A > > > > = **************************************************************************= ***** > > > > You need to create a GUI application to help your company's sales = people add and edit customer > > information. Would HDFS be appropriate for this customer information = file? > > A. Yes, because HDFS isoptimized forrandom access writes. > > B. Yes, because HDFS is optimized for fast retrieval of relatively = small amounts of data. > > C. No, becauseHDFS can only be accessed by MapReduce applications. > > D. No, because HDFS is optimized for write-once, streaming access = for relatively large files. > > > > My Answer : D > > Answer Given in site : A > > > > = **************************************************************************= ***** > > > > You need to create a job that does frequency analysis on input data. = You will do this by writing a > > Mapper that uses TextInputForma and splits each value (a line of = text from an input file) into > > individual characters. For each one of these characters, you will = emit the character as a key and > > as IntWritable as the value. Since this will produce proportionally = more intermediate data than > > input data, which resources could you expect to be likely = bottlenecks? > > A. Processor and RAM > > B. Processor and disk I/O > > C. Disk I/O and network I/O > > D. Processor and network I/O > > > > My Answer : D > > Answer Given in site : B > > > > = **************************************************************************= ***** > > > > Which of the following statements best describes how a large (100 = GB) file is stored in HDFS? > > A. The file is divided into variable size blocks, which are stored = on multiple data nodes. Each block > > is replicated three timesby default. > > B. The file is replicated three times by default. Each ropy of the = file is stored on a separate > > datanodes. > > C. The master copy of the file is stored on a single datanode. The = replica copies are divided into > > fixed-size blocks, which are stored on multiple datanodes. > > D. The file is divided into fixed-size blocks, which are stored on = multiple datanodes.Eachblock is > > replicated three times by default. Multiple blocks from the same = file mightreside on the same > > datanode. > > E. The tile is divided into fixed-sizeblocks, which are stored on = multiple datanodes.Eachblock is > > replicated three times by default.HDES guarantees that different = blocks from the same file are > > never on the same datanode. > > > > My Answer : D > > Answer Given in site : B > > > > = **************************************************************************= ***** > > > > regards, > > Rams >=20 >=20 --Apple-Mail=_19EA42DC-E374-4B8B-B171-52D067E944BF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="iso-8859-1" ramasubramanian.naraya= nan@gmail.com> wrote:
nothing as = consolidated...... I am collecting for the past 1 month... few as = printout and few from mails and few from googling and few from sites and = few from some of my friends...

regards,
Rams

On Wed, Nov 7, 2012 at 10:57 PM, Michael Segel <michael_segel@hotmail.com> = wrote:
Ok...
Where are you pulling these questions from?

Seriously.


On Nov 7, 2012, at 11:21 AM, Ramasubramanian Narayanan <ramasubramanian.naraya= nan@gmail.com> wrote:

> Hi,
>
>    I came across the following question in some sites and = the answer that they provided seems to be wrong according to me... I = might be wrong... Can some one help on confirming the right answers for = these 11 questions pls.. appreciate the explanation if you could able to = provide...
>
> = **************************************************************************= *****
> You are running a job that will process a single InputSplit on a = cluster which has no other jobs
> currently running. Each node has an equal number of open Map slots. = On which node will Hadoop
> first attempt to run the Map task?
> A. The node with the most memory
> B. The node with the lowest system load
> C. The node on which this InputSplit is stored
> D. The node with the most free local disk space
>
> My Answer            : C
> Answer Given in site : A
>
> = **************************************************************************= *****
> What is a Writable?
> A. Writable is an interface that all keys and values in MapReduce = must implement. Classes implementing this interface must implement = methods forserializingand deserializing themselves.
> B. Writable is an abstract class that all keys and values in = MapReduce must extend. Classes extending this abstract base class must = implementmethods for serializing and deserializingthemselves
> C. Writable is an interface that all keys, but not values, in = MapReduce must implement. Classes implementing this interface = mustimplementmethods for serializing and deserializing themselves.
> D. Writable is an abstract class that all keys, but not values, in = MapReduce must extend. Classes extending this abstract base class must = implementmethods for serializing and deserializing themselves.
>
> My Answer            : A
> Answer Given in site : B
>
> = **************************************************************************= *****
>
> You write a MapReduce job to process 100 files in HDFS. Your = MapReducc algorithm uses
> TextInputFormat and the IdentityReducer: the mapper applies a = regular expression over input
> values and emits key-value pairs with the key consisting of the = matching text, and the value
> containing the filename and byte offset. Determine the difference = between setting the number of
> reducers to zero.
> A. There is no differenceinoutput between the two settings.
> B. With zero reducers, no reducer runs and the job throws an = exception. With one reducer,
> instances of matching patterns are stored in a single file on = HDFS.
> C. With zero reducers, all instances of matching patterns are = gathered together in one file on
> HDFS. With one reducer, instances ofmatching patternsstored in = multiple files on HDFS.
> D. With zero reducers, instances of matching patterns are stored in = multiple files on HDFS. With
> one reducer, all instances of matching patterns aregathered = together in one file on HDFS.
>
> My Answer            : D
> Answer Given in site : C
>
> = **************************************************************************= *****
>
> During the standard sort and shuffle phase of MapReduce, keys and = values are passed to
> reducers. Which of the following is true?
> A. Keys are presented to a reducerin sorted order; values foragiven = key are not sorted.
> B. Keys are presented to a reducer in soiled order; values for a = given key are sorted in ascending
> order.
> C. Keys are presented to a reducer in random order; values for a = given key are not sorted.
> D. Keys are presented to a reducer in random order; values for a = given key are sorted in
> ascending order.
>
> My Answer            : A
> Answer Given in site : D
>
> = **************************************************************************= *****
>
> Which statement best describes the data path of intermediate = key-value pairs (i.e., output of the
> mappers)?
> A. Intermediate key-value pairs are written to HDFS. Reducers read = the intermediate data from
> HDFS.
> B. Intermediate key-value pairs are written to HDFS. Reducers copy = the intermediate data to the
> local disks of the machines runningthe reduce tasks.
> C. Intermediate key-value pairs are written to the local disks of = the machines running the map
> tasks, and then copied to the machinerunning thereduce tasks.
> D. Intermediate key-value pairs are written to the local disks of = the machines running the map
> tasks, and are then copied to HDFS. Reducers read theintermediate = data from HDFS.
>
> My Answer            : C
> Answer Given in site : B
>
> = **************************************************************************= *****
>
> You are developing a combiner that takes as input Text keys, = IntWritable values, and emits Text
> keys, Intwritable values. Which interface should your class = implement?
> A. Mapper <Text, IntWritable, Text, IntWritable>
> B. Reducer <Text, Text, IntWritable, IntWritable>
> C. Reducer <Text, IntWritable, Text, IntWritable>
> D. Combiner <Text, IntWritable, Text, IntWritable>
> E. Combiner <Text, Text, IntWritable, IntWritable>
>
> My Answer            : D
> Answer Given in site : C
>
> = **************************************************************************= *****
>
> What happens in a MapReduce job when you set the number of reducers = to one?
> A. A single reducer gathers and processes all the output from all = the mappers. The output is
> written in as many separate files as there are mappers.
> B. A single reducer gathers and processes all the output from all = the mappers. The output is
> written to a single file in HDFS.
> C. Setting the number of reducers to one creates a processing = bottleneck, and since the number
> of reducers as specified by the programmer is used as areference = value only, the MapReduce
> runtime provides a default setting for the number of reducers.
> D. Setting the number of reducers to one is invalid, and an = exception is thrown
>
> My Answer            : B
> Answer Given in site : C
>
> = **************************************************************************= *****
>
> In the standard word count MapReduce algorithm, why might using a = combiner reduce the overall
> Job running time?
> A. Because combiners perform local aggregation of word counts, = thereby allowing the mappers to
> process input data faster.
> B. Because combiners perform local aggregation of word counts, = thereby reducing the number of
> mappers that need to run.
> C. Because combiners perform local aggregation of word counts, and = then transfer that data to
> reducers without writing the intermediatedata to disk.
> D. Because combiners perform local aggregation of word counts, = thereby reducing the number of
> key-value pairs that need to be snuff letacross thenetwork to the = reducers.
>
> My Answer            : C
> Answer Given in site : A
>
> = **************************************************************************= *****
>
> You need to create a GUI application to help your company's sales = people add and edit customer
> information. Would HDFS be appropriate for this customer = information file?
> A. Yes, because HDFS isoptimized forrandom access writes.
> B. Yes, because HDFS is optimized for fast retrieval of relatively = small amounts of data.
> C. No, becauseHDFS can only be accessed by MapReduce = applications.
> D. No, because HDFS is optimized for write-once, streaming access = for relatively large files.
>
> My Answer            : D
> Answer Given in site : A
>
> = **************************************************************************= *****
>
> You need to create a job that does frequency analysis on input = data. You will do this by writing a
> Mapper that uses TextInputForma and splits each value (a line of = text from an input file) into
> individual characters. For each one of these characters, you will = emit the character as a key and
> as IntWritable as the value. Since this will produce proportionally = more intermediate data than
> input data, which resources could you expect to be likely = bottlenecks?
> A. Processor and RAM
> B. Processor and disk I/O
> C. Disk I/O and network I/O
> D. Processor and network I/O
>
> My Answer            : D
> Answer Given in site : B
>
> = **************************************************************************= *****
>
> Which of the following statements best describes how a large (100 = GB) file is stored in HDFS?
> A. The file is divided into variable size blocks, which are stored = on multiple data nodes. Each block
> is replicated three timesby default.
> B. The file is replicated three times by default. Each ropy of the = file is stored on a separate
> datanodes.
> C. The master copy of the file is stored on a single datanode. The = replica copies are divided into
> fixed-size blocks, which are stored on multiple datanodes.
> D. The file is divided into fixed-size blocks, which are stored on = multiple datanodes.Eachblock is
> replicated three times by default. Multiple blocks from the same = file mightreside on the same
> datanode.
> E. The tile is divided into fixed-sizeblocks, which are stored on = multiple datanodes.Eachblock is
> replicated three times by default.HDES guarantees that different = blocks from the same file are
> never on the same datanode.
>
> My Answer            : D
> Answer Given in site : B
>
> = **************************************************************************= *****
>
> regards,
> Rams



= --Apple-Mail=_19EA42DC-E374-4B8B-B171-52D067E944BF--