asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: Does the limit clause skew the results to a single NC?
Date Mon, 30 Nov 2015 19:40:38 GMT
It seems hitting the BigObject issue, the error message supposed to be "255 * DefaultFrameSize"
bytes.

On the other hand, I don’t quite understand the final statement:
-------------
//print all authors.
let $res := (for $t in  [$coAuth,$noCoAuth]
limit 100
return $t)
-------------

I think you are expecting a union operation instead. 
The list constructor ([]) doesn't unnest the record for the internal list. For example, I
tried the following query
-------------
let $x := [ { "a":1},{ "a":2},{ "a":3}]
let $y := [ { "b":1},{ "b":2},{ "b":3}]
let $xy := [$x, $y]
for $tx in $xy
return  $tx
-------------

It returns the following result. 
[ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
[ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
That means the $xy has two large records: $x and $y, not the six smaller records. 

Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two records. The first one
is the $coAuth list, and the second one is the $noCoAuth list. It will definitely hit the
big object problem or other memory issues if either one list is too big. 

You can try the union function as following:

for $t in $coAuth union $noCoAuth 
return $t

> On Nov 30, 2015, at 7:17 AM, Mike Carey <dtabass@gmail.com> wrote:
> 
> I will look into details later, but:
> 
> 1. The answer to your question is yes - ORDER BY and LIMIT will both have the results
landing (at present) on a single node.  We need to add support for range-partitioned results!
> 
> 2. It would be good to get familiar with reading query plans and also looking for "listify"
operations that might be in unfortunate places in query plans (which can cause frame size
issues).
> 
> Cheers,
> Mike
> 
> 
> On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
>> Hi Team,
>> 
>> I noticed a weird behavior when executing an AQL with the limit clause
>> (LIMIT 100000)
>> I get an exception in one NC: java.lang.OutOfMemoryError
>> while the others seem to operate normally.
>> 
>> my -Xmx configurations are the default:
>> nc.java.opts                             :-Xmx1536m
>> cc.java.opts                             :-Xmx1024m
>> 
>> Here is the story:
>> 
>> I have a dataset for publications. The data contains huge nested and
>> heterogenous records.
>> Therefore, the specified type contains only a unique ID.
>> 
>> create type wosType as open
>> {
>> UID:string
>> }
>> 
>> After loading the data, I want to extract all the authors names (first and
>> last). However, the authors details for each publications is *heterogenous*.
>> if there is only one author (i.e no co-authors), the type of field "name"
>> is a JSON object, ordered list o.w
>> 
>> So I did the following (excuse the ugliness of my AQL):
>> 
>> -----------------------------
>> use dataverse wosDataverse
>> 
>> *//Get name details for single-authors*
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>> 
>> *//Generate a list of names for all co-authors*
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>> 
>> *//Flatten the co-authors name list*
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>> 
>> //print all authors.
>> let $res := (for $t in  [$coAuth,$noCoAuth]
>> limit 100
>> return $t)
>> 
>> return $res
>> -----------------------------
>> 
>> 
>> This query couldn't be executed due to frame size limit:
>> 
>> Unable to allocate frame larger than:255 bytes [HyracksDataException]
>> 
>> So..
>> I limited the number of the results as such:
>> 
>> -----------------------------
>> use dataverse wosDataverse
>> let $noCoAuth := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count = "1"
>> *limit 100000*
>> return {
>> "firstName":$names.name.first_name,
>> "lastName":$names.name.last_name
>> }
>> )
>> 
>> let $coAuthList := (for $x in dataset wos
>> let $summary := $x.static_data.summary
>> let $names := $summary.names
>> where $names.count != "1"
>> return $names.name
>> )
>> 
>> let $coAuth := (for $x in $coAuthList
>> for $y in $x
>> *limit 100000*
>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>> 
>> 
>> let $res := (for $t in [$coAuth, $noCoAuth]
>> limit 100
>> return $t)
>> 
>> return $res
>> -----------------------------
>> 
>> Once I execute the previous AQL, one node (different one in each run)
>> reaches *400%* cpu-load (4-cores) and swallows up all the available memory
>> it can get.
>> 
>> 
>> For smaller result (e.g. limit 10000), it works fine.
>> 
>> 
>> Thanks and sorry for the long email.
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message