flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dromitl...@gmail.com
Subject Re: Flink + Druid example?
Date Mon, 10 Apr 2017 15:00:17 GMT
Thank you for the information, I'll have a look.

> On Apr 10, 2017, at 06:02, Steven Le Roux <leroux.steven@gmail.com> wrote:
> 
> Hi,
> 
> I'm head of @OvhMetrics which is a Cloud scaled managed time series platform targetting
IoT and Monitoring.
> 
> We're also using @warp10io components with some glue and optimisations. The storage layer
is based on Apache HBase which is to me an ideal compromise between storage efficiency (bytes
per data point, compression, no indexing), and performance (range scan capacities, custom
filters, ...)
> 
> This allows us to use two paradigm to produce data : either you use the HTTP endpoint,
either MR targetting directly HBase since Warp10 has strong hadoop integration.
> 
> Advantages of Warp10 vs Influx : 
>   - Warp10 is fully open source, influx is not (clustering not available as OSS)
>   - Influx is good at ingestion but it needs your data to come in order. Real time use
cases show that data points don't arrive in order (some are retained, buffering make older
point to arrive after newest, etc...)
>   - Warp10 has been measured at 1.8M data points/s per thread! (and not in an optimised
case)
>   - The true power of Warp10 is WarpScript: its query language that adopts a data flow
approach and has been designed for Time series from ground up. Our customers are doing truely
amazing things with WarpScript that contains nearly 800 functions...  It brings analytics
and signal processing over your time series data
>   - Warp10 can be deployed either standalone (in-mem or leveldb) or distributed mode
(hbase)
>   - Security is mandatory and does not affect performance
>   - you can delete massive amounts of data range or just a single point easily.
> 
> 
> Matt, if you want few metrics of our use of Warp10 inside OVH :
>   - 450M of unique series
>   - nominal load of 1.5M datapoints/s
>   - we have a delete rate of 10M data points/s
> 
> 
> If you have more interest in Warp10, you can ask there :  https://groups.google.com/forum/#!forum/warp10-users
> 
> 
> Regards,
> 
> 
> 
>> On Mon, Apr 10, 2017 at 10:26 AM, Alexis Gendronneau <a.gendronneau@gmail.com>
wrote:
>> hi,
>> 
>> Did you know http://www.warp10.io/ ? It's a geotimeserie database. As far as i know
this techno can handle 100k+  points per node ingestion, and its query language is powerful.
I already tried it to process timeseries correlation. I'm pretty sure you wont be disappionted
by it. 
>> 
>> Regards,
>> 
>> 2017-04-09 17:07 GMT+02:00 Matt <dromitlabs@gmail.com>:
>>> I just noticed the first link is wrong, I intended to send [1] instead.
>>> 
>>> On a second look at InfluxDB, the compression is really better than Druid, same
for write and read performance. I'll have a deeper look before committing to one.
>>> 
>>> [1] https://cdn2.hubspot.net/hub/528953/hubfs/Screen_Shot_2016-08-27_at_00.32.42.png?t=1491606817725
>>> 
>>> On Sat, Apr 8, 2017 at 9:40 PM, Matt <dromitlabs@gmail.com> wrote:
>>>> I compared them some days ago.
>>>> 
>>>> I found a useful article about many of the tsdb available out there [1],
check the big table on the article, it's really helpful. The thing that bothered me the most
about InfluxDB was not being able to setup a cluster using the open source distribution, that
may not be a problem in the future but I preferred to be able to do so now.
>>>> 
>>>> Regarding Druid there is also a really interesting talk by one of its committers
[2]. I liked some of the decisions they made regarding the way queries are executed and the
way the data is stored on disk (they have taken some ideas from the search engine industry).
>>>> 
>>>> The other promising alternative is Prometheus, though I haven't had a look
at it yet, I plan to do so in the near future.
>>>> 
>>>> If anyone is using a time-series database and wants to tell us about it that
would be helpful!
>>>> 
>>>> Best regards,
>>>> Matt
>>>> 
>>>> [1] https://blog.netsil.com/a-comparison-of-time-series-databases-and-netsils-use-of-druid-db805d471206
>>>> [2] https://www.youtube.com/watch?v=vbH8E0nH2Nw
>>>> 
>>>>> On Sat, Apr 8, 2017 at 8:16 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> I found this related post:
>>>>> 
>>>>> https://groups.google.com/forum/#!topic/druid-user/Co5WUZOMnEk
>>>>> 
>>>>>> On Sat, Apr 8, 2017 at 3:56 PM, Traku traku <trakuo@gmail.com>
wrote:
>>>>>> I'm using Influxdb. I think influxdb is easier as time-series database
solution.
>>>>>> 
>>>>>> Did you compare them?
>>>>>> 
>>>>>> Best regards.
>>>>>> 
>>>>>> 2017-04-07 21:01 GMT+02:00 Matt <dromitlabs@gmail.com>:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'm looking for an example of Tranquility (Druid's lib) as a
Flink sink.
>>>>>>> 
>>>>>>> I'm trying to follow the code in [1] but I feel it's incomplete
or maybe outdated, it doesn't mention anything about other method (tranquilizer) that seems
to be part of the BeamFactory interface in the current version.
>>>>>>> 
>>>>>>> If anyone has any code or a working project to use as a reference
that would be awesome for me and for the rest of us looking for a time-series database solution!
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Matt
>>>>>>> 
>>>>>>> [1] https://github.com/druid-io/tranquility/blob/master/docs/flink.md
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Alexis Gendronneau
>> 
>> alexis.gendronneau@corp.ovh.com
>> a.gendronneau@gmail.com
> 

Mime
View raw message