asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Does the limit clause skew the results to a single NC?
Date Mon, 30 Nov 2015 15:17:43 GMT
I will look into details later, but:

1. The answer to your question is yes - ORDER BY and LIMIT will both 
have the results landing (at present) on a single node.  We need to add 
support for range-partitioned results!

2. It would be good to get familiar with reading query plans and also 
looking for "listify" operations that might be in unfortunate places in 
query plans (which can cause frame size issues).

Cheers,
Mike


On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
> Hi Team,
>
> I noticed a weird behavior when executing an AQL with the limit clause
> (LIMIT 100000)
> I get an exception in one NC: java.lang.OutOfMemoryError
> while the others seem to operate normally.
>
> my -Xmx configurations are the default:
> nc.java.opts                             :-Xmx1536m
> cc.java.opts                             :-Xmx1024m
>
> Here is the story:
>
> I have a dataset for publications. The data contains huge nested and
> heterogenous records.
> Therefore, the specified type contains only a unique ID.
>
> create type wosType as open
> {
> UID:string
> }
>
> After loading the data, I want to extract all the authors names (first and
> last). However, the authors details for each publications is *heterogenous*.
> if there is only one author (i.e no co-authors), the type of field "name"
> is a JSON object, ordered list o.w
>
> So I did the following (excuse the ugliness of my AQL):
>
> -----------------------------
> use dataverse wosDataverse
>
> *//Get name details for single-authors*
> let $noCoAuth := (for $x in dataset wos
> let $summary := $x.static_data.summary
> let $names := $summary.names
> where $names.count = "1"
> return {
> "firstName":$names.name.first_name,
> "lastName":$names.name.last_name
> }
> )
>
> *//Generate a list of names for all co-authors*
> let $coAuthList := (for $x in dataset wos
> let $summary := $x.static_data.summary
> let $names := $summary.names
> where $names.count != "1"
> return $names.name
> )
>
> *//Flatten the co-authors name list*
> let $coAuth := (for $x in $coAuthList
> for $y in $x
> return {"firstName":$y.first_name,"lastName":$y.last_name})
>
> //print all authors.
> let $res := (for $t in  [$coAuth,$noCoAuth]
> limit 100
> return $t)
>
> return $res
> -----------------------------
>
>
> This query couldn't be executed due to frame size limit:
>
> Unable to allocate frame larger than:255 bytes [HyracksDataException]
>
> So..
> I limited the number of the results as such:
>
> -----------------------------
> use dataverse wosDataverse
> let $noCoAuth := (for $x in dataset wos
> let $summary := $x.static_data.summary
> let $names := $summary.names
> where $names.count = "1"
> *limit 100000*
> return {
> "firstName":$names.name.first_name,
> "lastName":$names.name.last_name
> }
> )
>
> let $coAuthList := (for $x in dataset wos
> let $summary := $x.static_data.summary
> let $names := $summary.names
> where $names.count != "1"
> return $names.name
> )
>
> let $coAuth := (for $x in $coAuthList
> for $y in $x
> *limit 100000*
> return {"firstName":$y.first_name,"lastName":$y.last_name})
>
>
> let $res := (for $t in [$coAuth, $noCoAuth]
> limit 100
> return $t)
>
> return $res
> -----------------------------
>
> Once I execute the previous AQL, one node (different one in each run)
> reaches *400%* cpu-load (4-cores) and swallows up all the available memory
> it can get.
>
>
> For smaller result (e.g. limit 10000), it works fine.
>
>
> Thanks and sorry for the long email.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message