hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: Storing/retrieving time series with hadoop
Date Mon, 12 Jan 2009 22:51:33 GMT
Hey Brock

I used Cascading quite extensively with time series data.

Along with the standard function/filter/aggregator operations in the  
Cascading processing model, there is what we call a "buffer".

Its really just a user friendly Reduce that integrates well with other  
operations and offers up a "sliding window" across your grouped data.  
Quite useful for running averages or filling in missing intervals etc.

Plus there are handy operations for switching from text time strings  
to long time stamps and back etc..

YMMV

cheers,
ckw

On Jan 7, 2009, at 5:03 PM, Brock Judkins wrote:

> Hi list,
> I am researching hadoop as a possible solution for my company's data
> warehousing solution. My question is whether hadoop, possibly in  
> combination
> with Hive or Pig, is a good solution for time-series data? We  
> basically have
> a ton of web analytics to store that we display both internally and
> externally.
>
> For the time being I am storing timestamped data points in a huge  
> MySQL
> table, but I know this will not scale very far (although it's  
> holding up ok
> at almost 90MM rows). I am aware that hadoop can scale insanely large
> (larger than I need), but does anyone have experience using it to draw
> charts based on time series with fairly low latency?
>
> Thanks!
> Brock

--
Chris K Wensel
chris@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/


Mime
View raw message