Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F399CD11 for ; Wed, 12 Mar 2014 19:08:44 +0000 (UTC) Received: (qmail 42689 invoked by uid 500); 12 Mar 2014 19:08:36 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 42179 invoked by uid 500); 12 Mar 2014 19:08:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42160 invoked by uid 99); 12 Mar 2014 19:08:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Mar 2014 19:08:34 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ados1984@gmail.com designates 209.85.219.43 as permitted sender) Received: from [209.85.219.43] (HELO mail-oa0-f43.google.com) (209.85.219.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Mar 2014 19:08:27 +0000 Received: by mail-oa0-f43.google.com with SMTP id g12so10663581oah.16 for ; Wed, 12 Mar 2014 12:08:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=+LFfaegEUXXckfFG+70NRuTTNwhHmPihU0C6lHY7Jpc=; b=dJeB1pT9MNlcq6Z1f43fB99uB1A3R0uNv3l8NWC9uRhK3nFJDm+xaMca1O75HNqYHb v3lLffm14jjl8KX1gaEyukgj0YMZXfJmlij6I58JWvyGVUQveWTBu5BaRDaI+Q5MEx4C cf8J9inxWKCeR81Lt9L84L95T5knQWb3BLliQfosFZR3YVivAr/bWh9esBSm8CW3jvtW /bnzLUD/WzLvQtvDGgUOtdedAQP+NYlNzBD8X0whmzvBqTGANKlGQw1/+XaY6D+E7o+a 82/TwEcQy9kpJT7gTOV55LXMr56F604eEGTaTITBDNZOsXXvgTCvro8KMIWVANIZr343 t6SQ== X-Received: by 10.60.83.234 with SMTP id t10mr34035690oey.4.1394651286514; Wed, 12 Mar 2014 12:08:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.227.169 with HTTP; Wed, 12 Mar 2014 12:07:46 -0700 (PDT) From: "ados1984@gmail.com" Date: Wed, 12 Mar 2014 15:07:46 -0400 Message-ID: Subject: Use Cases for Structured Data To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0116019c84b1aa04f46d8e7b X-Virus-Checked: Checked by ClamAV on apache.org --089e0116019c84b1aa04f46d8e7b Content-Type: text/plain; charset=ISO-8859-1 Hello Team, I am starting off on Hadoop eco-system and wanted to learn first based on my use case if Hadoop is right tool for me. I have only structured data and my goal is to safe this data into Hadoop and take benefit of replication factor. I am using Microsoft tools for doing analysis and it provides me with good drag and drop functionality for creating different kind of anaylsis and also it has hadoop drivers so it can have hadoop as data source for doing analysis. My question here is how benefits YARN architecture give me in tems of analysis that my Microsoft, Netezza of Tableau products are not giving me. I am just trying to understand value of introducing Hadoop in my Architecture in terms of Analysis apart from data replication. Any insights would be very helpful. Also, my goal for POC is related to efficient data storage/retrieval and so 1. how does data retrieval work in hadoop? 2. do i always need to have any kind of data source on top of hdfs like hbase/cassandra/mongo or there is not need for one and i can have all my data stored in hdfs directly and can retrieve them when i need by using different analytic tools that have hdfs as data source? 3. say if i have 3 node cluster, one master and 2 slaves and if am trying to insert data into hadoop then what is the cycle that framework performs to install my data into hdfs - does my process reads all meta data information from master node about where is my slaves nodes and what kind of data should go on which slave node or all data is send to master node and from there depending upon meta data information it reads and decides that what portion of data should be going to which node? 4. Also if i have 3 node cluster with 1 master and 2 slaves and if my data is equally distributed in two nodes and if i have replication set to 2 then where and how will replication take place as i do not have any node vacant for doing replication? 5. Also, for POC, does it make sense to go with Cloudera 3 node free cluster or Hortonworks 3 node free cluster or it makes sense to go with opensource hadoop version and if we go with open source hadoop version then where can we define that which is master node and which is slave node and also can we have all 3 nodes on same machine or we need to have all 3 nodes on different machines? 6. Also, what are the pros and cons with going through Hortonworks/Cloudera as opposed to Apache Hadoop from initial POC point of view? 7. Also, if we go with Hortonworks/Cloudera then what all tools are come clubbed together with Hadoop framework and if we go with Apache Hadoop, do we get any tools like Pig, Hive clubbed together or we have to install them separately? Since am staring off on Hadoop Journey recently, I would really appreciate if community can point me in right direction? Regards, Andy. --089e0116019c84b1aa04f46d8e7b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello Team,=A0

I am starting off on = Hadoop eco-system and wanted to learn first based on my use case if Hadoop = is right tool for me.=A0

I have only= structured data and my goal is to safe this data into Hadoop and take bene= fit of replication factor. I am using Microsoft tools for doing analysis an= d it provides me with good drag and drop functionality for creating differe= nt kind of anaylsis and also it has hadoop drivers so it can have hadoop as= data source for doing analysis.=A0

My question= here is how benefits YARN architecture give me in tems of analysis that my= Microsoft, Netezza of Tableau products are not giving me. I am just trying= to understand value of introducing Hadoop in my Architecture in terms of A= nalysis apart from data replication. Any insights would be very helpful.=A0=

Also, my go= al for POC is related to efficient data storage/retrieval and so=A0
  1. how does data retrieval work in hadoop?
  2. do i always need to have any kind of data source on top of hdfs like hb= ase/cassandra/mongo or there is not need for one and i can have all my data= stored in hdfs directly and can retrieve them when i need by using differe= nt analytic tools that have hdfs as data source?
  3. say if i have 3 node cluster, one master and 2 slaves and if am trying = to insert data into hadoop then what is the cycle that framework performs t= o install my data into hdfs - does my process reads all meta data informati= on from master node about where is my slaves nodes and what kind of data sh= ould go on which slave node or all data is send to master node and from the= re depending upon meta data information it reads and decides that what port= ion of data should be going to which node?=A0
  4. Also if i have 3 node cluster with 1 master and 2 slaves and if my data= is equally distributed in two nodes and if i have replication set to 2 the= n where and how will replication take place as i do not have any node vacan= t for doing replication? =A0
  5. Also, for POC, does it make sense to go with Cloudera 3 node free clust= er or Hortonworks 3 node free cluster or it makes sense to go with opensour= ce hadoop version and if we go with open source hadoop version then where c= an we define that which is master node and which is slave node and also can= we have all 3 nodes on same machine or we need to have all 3 nodes on diff= erent machines?
  6. Also, what are the pros and cons with going through Hortonworks/Clouder= a as opposed to Apache Hadoop from initial POC point of view?
  7. Also,= if we go with Hortonworks/Cloudera then what all tools are come clubbed to= gether with Hadoop framework and if we go with Apache Hadoop, do we get any= tools like Pig, Hive clubbed together or we have to install them separatel= y?
Since am staring off on Hadoop Journey recently, I would really a= ppreciate if community can point me in right direction?

Regards, Andy.=A0
--089e0116019c84b1aa04f46d8e7b--