flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Le Roux <leroux.ste...@gmail.com>
Subject Re: Flink + Druid example?
Date Mon, 10 Apr 2017 09:02:16 GMT

I'm head of @OvhMetrics which is a Cloud scaled managed time series
platform targetting IoT and Monitoring.

We're also using @warp10io components with some glue and optimisations. The
storage layer is based on Apache HBase which is to me an ideal compromise
between storage efficiency (bytes per data point, compression, no
indexing), and performance (range scan capacities, custom filters, ...)

This allows us to use two paradigm to produce data : either you use the
HTTP endpoint, either MR targetting directly HBase since Warp10 has strong
hadoop integration.

Advantages of Warp10 vs Influx :
  - Warp10 is fully open source, influx is not (clustering not available as
  - Influx is good at ingestion but it needs your data to come in order.
Real time use cases show that data points don't arrive in order (some are
retained, buffering make older point to arrive after newest, etc...)
  - Warp10 has been measured at 1.8M data points/s per thread! (and not in
an optimised case)
  - The true power of Warp10 is WarpScript: its query language that adopts
a data flow approach and has been designed for Time series from ground up.
Our customers are doing truely amazing things with WarpScript that contains
nearly 800 functions...  It brings analytics and signal processing over
your time series data
  - Warp10 can be deployed either standalone (in-mem or leveldb) or
distributed mode (hbase)
  - Security is mandatory and does not affect performance
  - you can delete massive amounts of data range or just a single point

Matt, if you want few metrics of our use of Warp10 inside OVH :
  - 450M of unique series
  - nominal load of 1.5M datapoints/s
  - we have a delete rate of 10M data points/s

If you have more interest in Warp10, you can ask there :


On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau <
a.gendronneau@gmail.com> wrote:

> hi,
> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far
> as i know this techno can handle 100k+  points per node ingestion, and its
> query language is powerful. I already tried it to process timeseries
> correlation. I'm pretty sure you wont be disappionted by it.
> Regards,
> 2017-04-09 17:07 GMT+02:00 Matt <dromitlabs@gmail.com>:
>> I just noticed the first link is wrong, I intended to send [1] instead.
>> On a second look at InfluxDB, the compression is really better than
>> Druid, same for write and read performance. I'll have a deeper look before
>> committing to one.
>> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_20
>> 16-08-27_at_00.32.42.png?t=1491606817725
>> On Sat, Apr 8, 2017 at 9:40 PM, Matt <dromitlabs@gmail.com> wrote:
>>> I compared them some days ago.
>>> I found a useful article about many of the tsdb available out there [1],
>>> check the big table on the article, it's really helpful. The thing that
>>> bothered me the most about InfluxDB was not being able to setup a cluster
>>> using the open source distribution, that may not be a problem in the future
>>> but I preferred to be able to do so now.
>>> Regarding Druid there is also a really interesting talk by one of its
>>> committers [2]. I liked some of the decisions they made regarding the way
>>> queries are executed and the way the data is stored on disk (they have
>>> taken some ideas from the search engine industry).
>>> The other promising alternative is Prometheus, though I haven't had a
>>> look at it yet, I plan to do so in the near future.
>>> If anyone is using a time-series database and wants to tell us about it
>>> that would be helpful!
>>> Best regards,
>>> Matt
>>> [1] https://blog.netsil.com/a-comparison-of-time-series-data
>>> bases-and-netsils-use-of-druid-db805d471206
>>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> I found this related post:
>>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <trakuo@gmail.com> wrote:
>>>>> I'm using Influxdb. I think influxdb is easier as time-series database
>>>>> solution.
>>>>> Did you compare them?
>>>>> Best regards.
>>>>> 2017-04-07 21:01 GMT+02:00 Matt <dromitlabs@gmail.com>:
>>>>>> Hi all,
>>>>>> I'm looking for an example of Tranquility (Druid's lib) as a Flink
>>>>>> sink.
>>>>>> I'm trying to follow the code in [1] but I feel it's incomplete or
>>>>>> maybe outdated, it doesn't mention anything about other method
>>>>>> (tranquilizer) that seems to be part of the BeamFactory interface
in the
>>>>>> current version.
>>>>>> If anyone has any code or a working project to use as a reference
>>>>>> that would be awesome for me and for the rest of us looking for a
>>>>>> time-series database solution!
>>>>>> Best regards,
>>>>>> Matt
>>>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
> --
> Alexis Gendronneau
> alexis.gendronneau@corp.ovh.com
> a.gendronneau@gmail.com

View raw message