asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: Two query questions on time bin function
Date Sun, 21 Feb 2016 08:35:37 GMT
Yiran and I came up with possible answers for these...
For 1) , a function could be used that looks something like this:

declare function minmax($x){
let $stdv := (avg(for $z in $x return $z*$z) - avg($x) * avg($x))^(0.5)
for $y in $x
where $y < (2*$stdv) + avg($x)
and $y > avg($x) - (2*$stdv)
return $y
}



And then applied to return a new copy of the list of values, removing ones
that are outside of 2 stdev.

For 2), we also did come up with a potential solution ,but the query fails
to compile (Filed as https://issues.apache.org/jira/browse/ASTERIXDB-1308 )


Any thoughts on these queries would be welcome :) 1) especially seems
inefficient to do as a function.

- Ian

On Fri, Feb 19, 2016 at 3:37 PM, Yiran Wang <wyr4137@gmail.com> wrote:

> Hi Asterix team,
>
> I have two queries I'm struggling with. I'm hoping you could provide a
> direction for me. Thanks in advance!
>
> Here is what the data structure looks like:
>
> create type HRMType as closed {
>
>   row_id: int32,
>
>   sid: int32,
>
>   date: date,
>
>   day: int32,
>
>   time: time,
>
>   bpm: int32,
>
>   RR: float
>
> };
>
> create dataset HRM (HRMType)
>
> primary key row_id;
>
>
> Previously I have used the time bin function to calculate the standard
> deviation of bpm for each time bin:
>
> for $i in dataset HRM
>
> group by $sid := $i.sid, $gdate := $i.date, $gday := $i.day, $timebin :=
> interval-bin($i.time, time("00:00:00"), day-time-duration("PT1M")) with $i
>
> return {
>
> "sid": $sid,
>
> "gdate": $gdate,
>
> "gday": $gday,
>
> "timebin": $timebin,
>
> "stdv": (avg(for $ii in $i return $ii.RR * $ii.RR) - avg(for $ii in $i
> return $ii.RR) * avg(for $ii in $i return $ii.RR))^(0.5)};
>
> ​Now I have two things I am hoping to do but need help with:
>
> 1. For each 1-min time bin, remove the bpm values that are above the top
> 5% or below the bottom 5%. I thought about using the min/max function for a
> few times to achieve this, but realized that it was not a good idea because
> in each time bin, the number of instances are not always the same. So for
> each 1-min time bin, we do need to calculate the 5% and 95% threshold, and
> remove instances accordingly, which I don't know how to do.
>
> 2. After removing the outliers of bpm for each 1-min time bin, calculate a
> median absolute deviation (MAD) for each 1-min time bin (as another measure
> of variation besides the standard deviation). MAD =
> median(abs(x-median(x)). I'm not sure how to write a query to do the median
> function in Asterix.
>
> Thank you so much in advance. Let me know if my questions are clear.
>
> Yiran
>
> --
> Best,
> Yiran
>
> --
> You received this message because you are subscribed to the Google Groups
> "asterixdb-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to asterixdb-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message