hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ados1984@gmail.com" <ados1...@gmail.com>
Subject Re: Architecture question on Injesting Data into Hadoop
Date Tue, 25 Mar 2014 03:32:22 GMT
Thank you Tariq but using Flume...how is structured data captured into
hdfs, let's say I do not have hbase or any other data store on top of
Hadoop then in that case...how will structured and un-structured data from
different input streams be captured into hdfs using flume and how can i go
in and divide what range of data would go on which node?

I am exploring Kafka for data injest mechanism, does anyone have experience
with using Kafka as core component in Hadoop Injest Project?

Kafka based architecture that I am think of is to have different kafka
queues for different data sources like web logs, mobile user activity,
portal etc and then each of this queues would have consumer that would
consume the data and put into hdfs..things am not sure about here is what
data format of message that is stored into hdfs?

also how is data partitioned among different nodes in hdfs, any thoughts or

On Mon, Mar 24, 2014 at 4:20 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hi Apurva,
> In would use some data ingestion tool like Apache Flume to make the task
> easier without much human intervention. Create sources for your different
> systems and rest will be taken care of by Fume. However, it is not a must
> to use something like Flume. But it will definitely make your life easier
> and will help you in developing a more sophisticated system, IMHO.
> You need HBase when you need rea-time random read/access to your data.
> Basically when you intend to have low latency access to small amounts of
> data from within a large data set and you have a flexible schema.
> And for the last part of your question, use Apache Hive. It provides us
> warehousing capabilities on top of an existing Hadoop cluster with an
> SQLish interface to query the stored data. Also, it will be of help while
> using Impala.
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
> On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <threadedblue@gmail.com>wrote:
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ados1984@gmail.com>wrote:
>>> Hello Team,
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>> Kindly advise !!!
>>> Regards, Apurva
>> --
>> There are ways and there are ways,
>> Geoffry Roberts

View raw message