hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ados1984@gmail.com" <ados1...@gmail.com>
Subject Re: Use Cases for Structured Data
Date Wed, 12 Mar 2014 19:37:22 GMT
Thank you Shahab but it would be really nice if I can get some input on my
initial question as it would really help.


On Wed, Mar 12, 2014 at 3:11 PM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> I would suggest that given the level of details that you are looking for
> and fundamental nature of your questions, you should get hold of books or
> online documentation. Basically some reading/research.
>
> Latest edition of
> http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is
> highly recommended to begin with.
>
> Regards,
> Shahab
>
>
> On Wed, Mar 12, 2014 at 3:07 PM, ados1984@gmail.com <ados1984@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am starting off on Hadoop eco-system and wanted to learn first based on
>> my use case if Hadoop is right tool for me.
>>
>> I have only structured data and my goal is to safe this data into Hadoop
>> and take benefit of replication factor. I am using Microsoft tools for
>> doing analysis and it provides me with good drag and drop functionality for
>> creating different kind of anaylsis and also it has hadoop drivers so it
>> can have hadoop as data source for doing analysis.
>>
>> My question here is how benefits YARN architecture give me in tems of
>> analysis that my Microsoft, Netezza of Tableau products are not giving me.
>> I am just trying to understand value of introducing Hadoop in my
>> Architecture in terms of Analysis apart from data replication. Any insights
>> would be very helpful.
>>
>> Also, my goal for POC is related to efficient data storage/retrieval and
>> so
>>
>>    1. how does data retrieval work in hadoop?
>>    2. do i always need to have any kind of data source on top of hdfs
>>    like hbase/cassandra/mongo or there is not need for one and i can have all
>>    my data stored in hdfs directly and can retrieve them when i need by using
>>    different analytic tools that have hdfs as data source?
>>    3. say if i have 3 node cluster, one master and 2 slaves and if am
>>    trying to insert data into hadoop then what is the cycle that framework
>>    performs to install my data into hdfs - does my process reads all meta data
>>    information from master node about where is my slaves nodes and what kind
>>    of data should go on which slave node or all data is send to master node
>>    and from there depending upon meta data information it reads and decides
>>    that what portion of data should be going to which node?
>>    4. Also if i have 3 node cluster with 1 master and 2 slaves and if my
>>    data is equally distributed in two nodes and if i have replication set to 2
>>    then where and how will replication take place as i do not have any node
>>    vacant for doing replication?
>>    5. Also, for POC, does it make sense to go with Cloudera 3 node free
>>    cluster or Hortonworks 3 node free cluster or it makes sense to go with
>>    opensource hadoop version and if we go with open source hadoop version then
>>    where can we define that which is master node and which is slave node and
>>    also can we have all 3 nodes on same machine or we need to have all 3 nodes
>>    on different machines?
>>    6. Also, what are the pros and cons with going through
>>    Hortonworks/Cloudera as opposed to Apache Hadoop from initial POC point of
>>    view?
>>    7. Also, if we go with Hortonworks/Cloudera then what all tools are
>>    come clubbed together with Hadoop framework and if we go with Apache
>>    Hadoop, do we get any tools like Pig, Hive clubbed together or we have to
>>    install them separately?
>>
>> Since am staring off on Hadoop Journey recently, I would really
>> appreciate if community can point me in right direction?
>>
>> Regards, Andy.
>>
>
>

Mime
View raw message