asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Aggregate function on collection of ordered list
Date Wed, 09 Dec 2015 07:17:17 GMT
I believe we need to do a major refactoring of all user-facing functions.
Created a root issue for that https://issues.apache.org/jira/browse/ASTERIXDB-1219 <https://issues.apache.org/jira/browse/ASTERIXDB-1219>
> On Dec 8, 2015, at 22:11, Wail Alkowaileet <wael.y.k@gmail.com> wrote:
> 
> Me again ...
> One way to workaround Namrata's problem is to enforce the type either by
> specifying the schema or at runtime:
> let $l := [[1.2, 2.3, 3.4],[6,3,7,2]]
> for $x in $l // for each list in the outer list
> let $k := (for $y in $x
> return abs($y)
> )
> return sql-avg($k)
> 
> This will work only if your list doesn't contain negative numbers. I think
> we need to unify the behavior in all functions on how to deal with type ANY.
> 
> 
> 
> On Tue, Dec 8, 2015 at 11:34 AM, Wail Alkowaileet <wael.y.k@gmail.com>
> wrote:
> 
>> That's one thing I observed in the built-in functions. Some work perfectly
>> fine with the open type and some are not.
>> As for instance, if I want to do string-length on a string that's not
>> declared in my schema. I have to trick the compiler as such
>> string-length(string-concat(["",$mystring]) to infer the type of $mystring
>> as UNION(NULL, STRING) instead of ANY to satisfies the check conditions.
>> 
>> I really don't know what would be the best solution. However, I think it
>> would be better for open type queries to fail at runtime instead of compile
>> time. But ... from a user experience point-of-view, runtime fail can be
>> problematic in a situation where I can apply the function to the first n-1
>> of the records and fails at the last record.
>> 
>> On Tue, Dec 8, 2015 at 1:04 AM, Ildar Absalyamov <
>> ildar.absalyamov@gmail.com> wrote:
>> 
>>> That’s true, the trick will work only for homogeneous lists.
>>> 
>>> On Dec 7, 2015, at 13:00, Ian Maxon <imaxon@uci.edu> wrote:
>>> 
>>> We still can't declare a list of mixed type though, I don't think. I
>>> was trying that earlier and ran into some cryptic errors about Java
>>> typecasting. Hopefully that isn't necessary though as the NetCDF (or
>>> the json representation thereof) isn't dynamically structured (e.g.
>>> open types aren't necessary)?
>>> 
>>> On Mon, Dec 7, 2015 at 12:48 PM, Ildar Absalyamov
>>> <ildar.absalyamov@gmail.com> wrote:
>>> 
>>> Namrata,
>>> 
>>> I assume the aforementioned query with record defined in let clause was
>>> only the example.
>>> That query indeed has a bug, but is happen only because the type of the
>>> list is not statically enforced.
>>> 
>>> Do you load your data into dataset? I so what is the type of that dataset?
>>> If you enforce the type of your nested ordered lists upon data ingestion
>>> you can calculate the average:
>>> 
>>> drop dataverse test if exists
>>> create dataverse test
>>> use dataverse test
>>> 
>>> create type testType as {
>>> id: int32,
>>> list: [[double]]
>>> }
>>> 
>>> create dataset testDS(testType) primary key id;
>>> insert into dataset testDS({"id": 1, "list": [[1.2, 2.3,
>>> 3.4],[6,3,7,2]]});
>>> 
>>> for $x in dataset  testDS
>>> for $y in $x.list
>>> return {"avg": avg($y)}
>>> 
>>> On Dec 7, 2015, at 09:57, Malarout, Namrata (398M-Affiliate) <
>>> Namrata.Malarout@jpl.nasa.gov> wrote:
>>> 
>>> Hi,
>>> 
>>> Wail, thanks for looking into it and explaining the use of for. I will be
>>> following the issue. However, working with my sample data  may be a little
>>> more tricky. I have a couple hundred of records which contain such nested
>>> ordered lists. I would like to perform an aggregation over all the values
>>> across all the records. Any suggestions on how to do it?
>>> 
>>> Mike, thanks for understanding :) Appreciate all the help.
>>> -Namrata
>>> 
>>> From: Michael Carey <mjcarey@ics.uci.edu <mailto:mjcarey@ics.uci.edu
>>> <mjcarey@ics.uci.edu>>>
>>> Reply-To: "users@asterixdb.incubator.apache.org <
>>> mailto:users@asterixdb.incubator.apache.org
>>> <users@asterixdb.incubator.apache.org>>" <
>>> users@asterixdb.incubator.apache.org <
>>> mailto:users@asterixdb.incubator.apache.org
>>> <users@asterixdb.incubator.apache.org>>>
>>> Date: Monday, December 7, 2015 at 7:28 AM
>>> To: "users@asterixdb.incubator.apache.org <
>>> mailto:users@asterixdb.incubator.apache.org
>>> <users@asterixdb.incubator.apache.org>>" <
>>> users@asterixdb.incubator.apache.org <
>>> mailto:users@asterixdb.incubator.apache.org
>>> <users@asterixdb.incubator.apache.org>>>, "
>>> dev@asterixdb.incubator.apache.org <
>>> mailto:dev@asterixdb.incubator.apache.org
>>> <dev@asterixdb.incubator.apache.org>>" <
>>> dev@asterixdb.incubator.apache.org<
>>> mailto:dev@asterixdb.incubator.apache.org
>>> <dev@asterixdb.incubator.apache.org>>>
>>> Subject: Re: Aggregate function on collection of ordered list
>>> 
>>> + Looping in the dev list to try and get fast attention to the fix, if
>>> it's easy!
>>> (I know that Namarata's under time pressure in a NASA bakeoff exercise.
>>> :-))
>>> 
>>> On 12/7/15 4:59 AM, Wail Alkowaileet wrote:
>>> 
>>> It's an easy fix...
>>> Thanks for reporting that.
>>> 
>>> I reported it in https://issues.apache.org/jira/browse/ASTERIXDB-1216 <
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1216>
>>> 
>>> On Mon, Dec 7, 2015 at 3:33 PM, Wail Alkowaileet <wael.y.k@gmail.com <
>>> mailto:wael.y.k@gmail.com <wael.y.k@gmail.com>>> wrote:
>>> Hi Namrata,
>>> 
>>> The best way to think of for in lists is to think it works as foreach in
>>> java.
>>> So ..
>>> in your first query, it should be like:
>>> 
>>> let $l := [[1.2, 2.3, 3.4],[6,3,7,2]]
>>> for $x in $l // for each list in the outer list
>>> return {"avg”: avg($y)}
>>> 
>>> However, I tried it and it seems that there is a bug for applying
>>> aggregation on nested open field.
>>> 
>>> I'll look into it to see if it's an easy fix
>>> 
>>> 
>>> 
>>> On Mon, Dec 7, 2015 at 2:52 PM, Malarout, Namrata (398M-Affiliate) <
>>> Namrata.Malarout@jpl.nasa.gov<mailto:Namrata.Malarout@jpl.nasa.gov
>>> <Namrata.Malarout@jpl.nasa.gov>>> wrote:
>>> Hi,
>>> 
>>> I am trying to perform avg, sum, min and max functions on a collection of
>>> ordered lists. An example is:
>>> let $l := [[1.2, 2.3, 3.4],[6,3,7,2]]
>>> return {"avg”: avg($l)}
>>> 
>>> I have tried both avg and sql-avg. But I get the following error:
>>> Cannot compute AVG for values of type ORDEREDLIST
>>> [NotImplementedException].
>>> 
>>> I’ve attached the sample data that I’m working with (sample.adm). My AQL
>>> query to find the average of analysis_error looks like:
>>> 
>>> use dataverse Test;
>>> for $f in dataset sample
>>> where not(is-null($f.analysis_error))
>>> return avg($f.analysis_error);
>>> 
>>> The error seen is as follows:
>>> Type of argument in function-call: asterix:avg, Args:[function-call:
>>> asterix:field-access-by-name, Args:[%0->$$0, AString: {analysis_error}]]
>>> should be a collection type instead of ANY [AlgebricksException]
>>> 
>>> I would like to know what is the correct syntax to find the average.
>>> Appreciate the help.
>>> Thanks,
>>> Namrata
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Regards,
>>> Wail Alkowaileet
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Regards,
>>> Wail Alkowaileet
>>> 
>>> 
>>> 
>>> Best regards,
>>> Ildar
>>> 
>>> 
>>> Best regards,
>>> Ildar
>>> 
>>> 
>> 
>> 
>> --
>> 
>> *Regards,*
>> Wail Alkowaileet
>> 
> 
> 
> 
> -- 
> 
> *Regards,*
> Wail Alkowaileet

Best regards,
Ildar


Mime
View raw message