hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mallanagouda Patil <mallanagouda.c.pa...@gmail.com>
Subject RE: Kafka or Flume
Date Fri, 30 Jun 2017 05:14:14 GMT
Kafka is capable of processing billions of events per second. You can scale
it horizontally with Kafka broker servers.

You can try out these steps

1. Create a topic in Kafka to get your all data. You have to use Kafka
producer to ingest data into Kafka.
2. If you are going to write your own HDFS client to put data into HDFS
then, you can read data from topic in step-1, validate and store into HDFS.
3. If you want to OpenSource tool (Gobbling or confluent Kafka HDFS
connector) to put data into HDFS then
Write tool to read data from topic, validate and store in other topic.

We are using combination of these steps to process over 10 million
events/second.

I hope it helps..

Thanks
Mallan

On Jun 30, 2017 10:31 AM, "Sidharth Kumar" <sidharthkumar2707@gmail.com>
wrote:

> Thanks! What about Kafka with Flume? And also I would like to tell that
> everyday data intake is in millions and can't afford to loose even a single
> piece of data. Which makes a need of  high availablity.
>
> Warm Regards
>
> Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  LinkedIn:
> www.linkedin.com/in/sidharthkumar2792
>
>
>
>
>
>
> On 30-Jun-2017 10:04 AM, "JP gupta" <JP.Gupta@altruistindia.com> wrote:
>
>> The ideal sequence should be:
>>
>> 1.      Ingress using Kafka -> Validation and processing using Spark ->
>> Write into any NoSql DB or Hive.
>>
>> From my recent experience, writing directly to HDFS can be slow depending
>> on the data format.
>>
>>
>>
>> Thanks
>>
>> JP
>>
>>
>>
>> *From:* Sudeep Singh Thakur [mailto:sudeepthakur90@gmail.com]
>> *Sent:* 30 June 2017 09:26
>> *To:* Sidharth Kumar
>> *Cc:* Maggy; common-user@hadoop.apache.org
>> *Subject:* Re: Kafka or Flume
>>
>>
>>
>> In your use Kafka would be better because you want some transformations
>> and validations.
>>
>> Kind regards,
>> Sudeep Singh Thakur
>>
>>
>>
>> On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <sidharthkumar2707@gmail.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I have a requirement where I have all transactional data injestion into
>> hadoop in real time and before storing the data into hadoop, process it to
>> validate the data. If the data failed to pass validation process , it will
>> not be stored into hadoop. The validation process also make use of
>> historical data which is stored in hadoop. So, my question is which
>> injestion tool will be best for this Kafka or Flume?
>>
>>
>>
>> Any suggestions will be a great help for me.
>>
>>
>> Warm Regards
>>
>> Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 |  LinkedIn:
>> www.linkedin.com/in/sidharthkumar2792
>>
>>
>>
>>
>>
>>
>

Mime
View raw message