Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ados1984@gmail.com designates
 209.85.219.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEo-6+Rh6O0wYim_axRFUTJqfkwk0szK8ZvttLG5prz6tYTNaw@mail.gmail.com>
References: 
 <CALvqP3Rte7M5v1yx0h5pqZ8dWudxj_kkA7CDJrWEzuvoSZm9WA@mail.gmail.com>
 <CAEo-6+Rh6O0wYim_axRFUTJqfkwk0szK8ZvttLG5prz6tYTNaw@mail.gmail.com>
From: "ados1984@gmail.com" <ados1984@gmail.com>
Date: Wed, 12 Mar 2014 15:37:22 -0400
Message-ID: 
 <CALvqP3RYUc_Kd74LzWzQnhuOEKjCvXqUtnGHpiuNHCBjT0ZCkQ@mail.gmail.com>
Subject: Re: Use Cases for Structured Data
To: user <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=089e013d0dc066513004f46df852

--089e013d0dc066513004f46df852
Content-Type: text/plain; charset=ISO-8859-1

Thank you Shahab but it would be really nice if I can get some input on my
initial question as it would really help.


On Wed, Mar 12, 2014 at 3:11 PM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> I would suggest that given the level of details that you are looking for
> and fundamental nature of your questions, you should get hold of books or
> online documentation. Basically some reading/research.
>
> Latest edition of
> http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is
> highly recommended to begin with.
>
> Regards,
> Shahab
>
>
> On Wed, Mar 12, 2014 at 3:07 PM, ados1984@gmail.com <ados1984@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am starting off on Hadoop eco-system and wanted to learn first based on
>> my use case if Hadoop is right tool for me.
>>
>> I have only structured data and my goal is to safe this data into Hadoop
>> and take benefit of replication factor. I am using Microsoft tools for
>> doing analysis and it provides me with good drag and drop functionality for
>> creating different kind of anaylsis and also it has hadoop drivers so it
>> can have hadoop as data source for doing analysis.
>>
>> My question here is how benefits YARN architecture give me in tems of
>> analysis that my Microsoft, Netezza of Tableau products are not giving me.
>> I am just trying to understand value of introducing Hadoop in my
>> Architecture in terms of Analysis apart from data replication. Any insights
>> would be very helpful.
>>
>> Also, my goal for POC is related to efficient data storage/retrieval and
>> so
>>
>>    1. how does data retrieval work in hadoop?
>>    2. do i always need to have any kind of data source on top of hdfs
>>    like hbase/cassandra/mongo or there is not need for one and i can have all
>>    my data stored in hdfs directly and can retrieve them when i need by using
>>    different analytic tools that have hdfs as data source?
>>    3. say if i have 3 node cluster, one master and 2 slaves and if am
>>    trying to insert data into hadoop then what is the cycle that framework
>>    performs to install my data into hdfs - does my process reads all meta data
>>    information from master node about where is my slaves nodes and what kind
>>    of data should go on which slave node or all data is send to master node
>>    and from there depending upon meta data information it reads and decides
>>    that what portion of data should be going to which node?
>>    4. Also if i have 3 node cluster with 1 master and 2 slaves and if my
>>    data is equally distributed in two nodes and if i have replication set to 2
>>    then where and how will replication take place as i do not have any node
>>    vacant for doing replication?
>>    5. Also, for POC, does it make sense to go with Cloudera 3 node free
>>    cluster or Hortonworks 3 node free cluster or it makes sense to go with
>>    opensource hadoop version and if we go with open source hadoop version then
>>    where can we define that which is master node and which is slave node and
>>    also can we have all 3 nodes on same machine or we need to have all 3 nodes
>>    on different machines?
>>    6. Also, what are the pros and cons with going through
>>    Hortonworks/Cloudera as opposed to Apache Hadoop from initial POC point of
>>    view?
>>    7. Also, if we go with Hortonworks/Cloudera then what all tools are
>>    come clubbed together with Hadoop framework and if we go with Apache
>>    Hadoop, do we get any tools like Pig, Hive clubbed together or we have to
>>    install them separately?
>>
>> Since am staring off on Hadoop Journey recently, I would really
>> appreciate if community can point me in right direction?
>>
>> Regards, Andy.
>>
>
>

--089e013d0dc066513004f46df852
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thank you Shahab but it would be really nice if I can get =
some input on my initial question as it would really help.=A0</div><div cla=
ss=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Mar 12, 2014 =
at 3:11 PM, Shahab Yunus <span dir=3D"ltr">&lt;<a href=3D"mailto:shahab.yun=
us@gmail.com" target=3D"_blank">shahab.yunus@gmail.com</a>&gt;</span> wrote=
:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I would suggest that given =
the level of details that you are looking for and fundamental nature of you=
r questions, you should get hold of books or online documentation. Basicall=
y some reading/research.<div>

<br>
</div><div>Latest edition of <a href=3D"http://www.amazon.com/Hadoop-Defini=
tive-Guide-Tom-White/dp/1449311520" target=3D"_blank">http://www.amazon.com=
/Hadoop-Definitive-Guide-Tom-White/dp/1449311520</a> is highly recommended =
to begin with.</div>


<div><br></div><div>Regards,</div><div>Shahab</div></div><div class=3D"HOEn=
Zb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">On Wed, Mar 12, 2014 at 3:07 PM, <a href=3D"mailto:ados1984@gmail.=
com" target=3D"_blank">ados1984@gmail.com</a> <span dir=3D"ltr">&lt;<a href=
=3D"mailto:ados1984@gmail.com" target=3D"_blank">ados1984@gmail.com</a>&gt;=
</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra">=
Hello Team,=A0</div><div class=3D"gmail_extra"><br></div><div class=3D"gmai=
l_extra">

I am starting off on Hadoop eco-system and wanted to learn first based on m=
y use case if Hadoop is right tool for me.=A0</div>


<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">I have only=
 structured data and my goal is to safe this data into Hadoop and take bene=
fit of replication factor. I am using Microsoft tools for doing analysis an=
d it provides me with good drag and drop functionality for creating differe=
nt kind of anaylsis and also it has hadoop drivers so it can have hadoop as=
 data source for doing analysis.=A0</div>


<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">My question=
 here is how benefits YARN architecture give me in tems of analysis that my=
 Microsoft, Netezza of Tableau products are not giving me. I am just trying=
 to understand value of introducing Hadoop in my Architecture in terms of A=
nalysis apart from data replication. Any insights would be very helpful.=A0=
</div>


<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">Also, my go=
al for POC is related to efficient data storage/retrieval and so=A0</div><d=
iv class=3D"gmail_extra"><ol><li>how does data retrieval work in hadoop?<br=
></li>


<li>do i always need to have any kind of data source on top of hdfs like hb=
ase/cassandra/mongo or there is not need for one and i can have all my data=
 stored in hdfs directly and can retrieve them when i need by using differe=
nt analytic tools that have hdfs as data source?</li>


<li>say if i have 3 node cluster, one master and 2 slaves and if am trying =
to insert data into hadoop then what is the cycle that framework performs t=
o install my data into hdfs - does my process reads all meta data informati=
on from master node about where is my slaves nodes and what kind of data sh=
ould go on which slave node or all data is send to master node and from the=
re depending upon meta data information it reads and decides that what port=
ion of data should be going to which node?=A0</li>


<li>Also if i have 3 node cluster with 1 master and 2 slaves and if my data=
 is equally distributed in two nodes and if i have replication set to 2 the=
n where and how will replication take place as i do not have any node vacan=
t for doing replication? =A0</li>


<li>Also, for POC, does it make sense to go with Cloudera 3 node free clust=
er or Hortonworks 3 node free cluster or it makes sense to go with opensour=
ce hadoop version and if we go with open source hadoop version then where c=
an we define that which is master node and which is slave node and also can=
 we have all 3 nodes on same machine or we need to have all 3 nodes on diff=
erent machines?</li>


<li>Also, what are the pros and cons with going through Hortonworks/Clouder=
a as opposed to Apache Hadoop from initial POC point of view?</li><li>Also,=
 if we go with Hortonworks/Cloudera then what all tools are come clubbed to=
gether with Hadoop framework and if we go with Apache Hadoop, do we get any=
 tools like Pig, Hive clubbed together or we have to install them separatel=
y?</li>


</ol><div>Since am staring off on Hadoop Journey recently, I would really a=
ppreciate if community can point me in right direction?</div><div><br></div=
><div>Regards, Andy.=A0</div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e013d0dc066513004f46df852--