hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Faris <afa...@linkedin.com>
Subject Re: Is there any way to use a hdfs file as a Circular buffer?
Date Thu, 15 Aug 2013 18:16:30 GMT
If every device can send it's information as a 'event', you could use a publish-subscribe messaging
system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's
storage by saving the last 'n-events' of data, acting like a circular buffer.  The device
would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming
events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban
as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to
run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time
streaming data. (http://incubator.apache.org/projects/samza.html) 

On Aug 7, 2013, at 9:59 AM, Wukang Lin <vboylin1987@gmail.com> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework,
but what we need is a storage system, high throughput and concurrency is matters.We have thousands
of devices, each device will produce a steady stream of brinary data. The space for every
device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve
> Many Thanks
> Lin Wukang
> 2013/8/8 Shekhar Sharma <shekhar2581@gmail.com>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying
to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <vboylin1987@gmail.com> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular
storage,throughput and concurrency are the most important indicators.The first way seems work,
but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase
is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good
for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe
It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a
lot of small file as a large one. Perhaps we should develop it by ourselives.
> Thank you.
> Lin Wukang
> 2013/7/25 Niels Basjes <Niels@basjes.nl>
> A circular file on hdfs is not possible.
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular
buffer (capped collection).
> Niels
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas
to a directory on hdfs, and writting data to a file in that directory continuously. Once the
quotas exceeded, I can redirect the writter and write the data from the beginning of the file
automatically .

View raw message