hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Deutsch <tdeut...@us.ibm.com>
Subject Re: Temporal query
Date Thu, 29 Mar 2012 15:22:26 GMT
Matthieu - you are welcome to contact me off list for assistance with Jaql.

Sent from my Blackberry so please excuse typing and spelling errors.

----- Original Message -----
From: Robert Evans [evans@yahoo-inc.com]
Sent: 03/29/2012 10:09 AM EST
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>; "core-user@hadoop.apache.org"
Subject: Re: Temporal query

I am not aware of anyone that does this for you directly, but it should not be too difficult
for you to write what you want using pig or hive.  I am not as familiar with Jaql but I assume
that you can do it there too.  Although it might be simpler to write it using Map/Reduce because
we can abuse Map/Reduce in ways that the higher level languages disallow so that they can
do optimizations.

What I would do is in the mapper scan through each entry and look for transitions of $value
around $threshold, and the time that they occurred.  You can then look for 30+ second windows
where $value > $threshold within that partition and output them to the reducer.  The trick
with this is that you need to pay special attention to the beginning and end of the partition.
 You need to also send to the reducer the state at the beginning and end of each partition
and how long it was in that state.  The reducer can then combine these pieces together and
see if they meet the 30+ second criteria. If so output them with the rest, otherwise don't.
 The known times when it is > 30 seconds can be sent to any reducer, so they can have any
key, but for the transitions to work correctly you need to send them to a single reducer,
so they should have a very specific key.  You could also try to divide them up if you have
to scale very very large, but that would be rather difficult to get right.

--Bobby Evans

On 3/29/12 4:02 AM, "banermatt" <banermatt@hotmail.fr> wrote:


I'm developping a log file anomaly detection system on an hadoop cluster.
I'm looking for a way to process query like: "select all values when
value>threshold for a duration>30 secondes". Do you know a tool which could
help me to process such a query?
I documented on the script langages pig, hive and jaql which seem to have
very similar application. I tried it but I was not be able to do what I

Thank you in advance,


View this message in context: http://old.nabble.com/Temporal-query-tp33544869p33544869.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message