asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <wael....@gmail.com>
Subject Re: Does the limit clause skew the results to a single NC?
Date Tue, 01 Dec 2015 17:16:08 GMT
@Jianfen Actually I need every pair (first and last name) to be
combined together and not separated
besides, I think UNION is a private function as in AsterixBuiltinFunctions
-----------
addPrivateFunction(UNION, UnorderedListConstructorResultType.INSTANCE, true
);
-----------

And probably you would ask why I'm doing this as well:
----------
let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)

return $res
---------
For some reason, the limit clause get skipped as I filed in 1204
<https://issues.apache.org/jira/browse/ASTERIXDB-1204> and this just to
work around it.

@Mike I did this to modularize my query :) so that I can reuse/edit it
easily ( if I need to add more information for each coAuth and noCoAuth).
But, ifThenElse can be very handy in such cases, but I got a NPE (the
inferred type is null in UnaryBooleanOrNullFunctionTypeComputer) when the
query plan consists of two subplans in the same level. I filed that as well
in 1203 <https://issues.apache.org/jira/browse/ASTERIXDB-1203>.

Thanks!



On Tue, Dec 1, 2015 at 2:35 AM, Mike Carey <dtabass@gmail.com> wrote:

> Another approach (sketchily/logically) would be to do the case-handling on
> output, i.e., don't start by segmenting things based on which kind they are
> - process them all and do the different handling in the return clause...?
>
>
> On 11/30/15 11:40 AM, Jianfeng Jia wrote:
>
>> It seems hitting the BigObject issue, the error message supposed to be
>> "255 * DefaultFrameSize" bytes.
>>
>> On the other hand, I don’t quite understand the final statement:
>> -------------
>> //print all authors.
>> let $res := (for $t in  [$coAuth,$noCoAuth]
>> limit 100
>> return $t)
>> -------------
>>
>> I think you are expecting a union operation instead.
>> The list constructor ([]) doesn't unnest the record for the internal
>> list. For example, I tried the following query
>> -------------
>> let $x := [ { "a":1},{ "a":2},{ "a":3}]
>> let $y := [ { "b":1},{ "b":2},{ "b":3}]
>> let $xy := [$x, $y]
>> for $tx in $xy
>> return  $tx
>> -------------
>>
>> It returns the following result.
>> [ { "a": 1 }, { "a": 2 }, { "a": 3 } ]
>> [ { "b": 1 }, { "b": 2 }, { "b": 3 } ]
>> That means the $xy has two large records: $x and $y, not the six smaller
>> records.
>>
>> Similarly, the "for $t in [$coAuth,$noCoAuth]” will only return two
>> records. The first one is the $coAuth list, and the second one is the
>> $noCoAuth list. It will definitely hit the big object problem or other
>> memory issues if either one list is too big.
>>
>> You can try the union function as following:
>>
>> for $t in $coAuth union $noCoAuth
>> return $t
>>
>> On Nov 30, 2015, at 7:17 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>> I will look into details later, but:
>>>
>>> 1. The answer to your question is yes - ORDER BY and LIMIT will both
>>> have the results landing (at present) on a single node.  We need to add
>>> support for range-partitioned results!
>>>
>>> 2. It would be good to get familiar with reading query plans and also
>>> looking for "listify" operations that might be in unfortunate places in
>>> query plans (which can cause frame size issues).
>>>
>>> Cheers,
>>> Mike
>>>
>>>
>>> On 11/30/15 5:55 AM, Wail Alkowaileet wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I noticed a weird behavior when executing an AQL with the limit clause
>>>> (LIMIT 100000)
>>>> I get an exception in one NC: java.lang.OutOfMemoryError
>>>> while the others seem to operate normally.
>>>>
>>>> my -Xmx configurations are the default:
>>>> nc.java.opts                             :-Xmx1536m
>>>> cc.java.opts                             :-Xmx1024m
>>>>
>>>> Here is the story:
>>>>
>>>> I have a dataset for publications. The data contains huge nested and
>>>> heterogenous records.
>>>> Therefore, the specified type contains only a unique ID.
>>>>
>>>> create type wosType as open
>>>> {
>>>> UID:string
>>>> }
>>>>
>>>> After loading the data, I want to extract all the authors names (first
>>>> and
>>>> last). However, the authors details for each publications is
>>>> *heterogenous*.
>>>> if there is only one author (i.e no co-authors), the type of field
>>>> "name"
>>>> is a JSON object, ordered list o.w
>>>>
>>>> So I did the following (excuse the ugliness of my AQL):
>>>>
>>>> -----------------------------
>>>> use dataverse wosDataverse
>>>>
>>>> *//Get name details for single-authors*
>>>> let $noCoAuth := (for $x in dataset wos
>>>> let $summary := $x.static_data.summary
>>>> let $names := $summary.names
>>>> where $names.count = "1"
>>>> return {
>>>> "firstName":$names.name.first_name,
>>>> "lastName":$names.name.last_name
>>>> }
>>>> )
>>>>
>>>> *//Generate a list of names for all co-authors*
>>>> let $coAuthList := (for $x in dataset wos
>>>> let $summary := $x.static_data.summary
>>>> let $names := $summary.names
>>>> where $names.count != "1"
>>>> return $names.name
>>>> )
>>>>
>>>> *//Flatten the co-authors name list*
>>>> let $coAuth := (for $x in $coAuthList
>>>> for $y in $x
>>>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>>>>
>>>> //print all authors.
>>>> let $res := (for $t in  [$coAuth,$noCoAuth]
>>>> limit 100
>>>> return $t)
>>>>
>>>> return $res
>>>> -----------------------------
>>>>
>>>>
>>>> This query couldn't be executed due to frame size limit:
>>>>
>>>> Unable to allocate frame larger than:255 bytes [HyracksDataException]
>>>>
>>>> So..
>>>> I limited the number of the results as such:
>>>>
>>>> -----------------------------
>>>> use dataverse wosDataverse
>>>> let $noCoAuth := (for $x in dataset wos
>>>> let $summary := $x.static_data.summary
>>>> let $names := $summary.names
>>>> where $names.count = "1"
>>>> *limit 100000*
>>>> return {
>>>> "firstName":$names.name.first_name,
>>>> "lastName":$names.name.last_name
>>>> }
>>>> )
>>>>
>>>> let $coAuthList := (for $x in dataset wos
>>>> let $summary := $x.static_data.summary
>>>> let $names := $summary.names
>>>> where $names.count != "1"
>>>> return $names.name
>>>> )
>>>>
>>>> let $coAuth := (for $x in $coAuthList
>>>> for $y in $x
>>>> *limit 100000*
>>>> return {"firstName":$y.first_name,"lastName":$y.last_name})
>>>>
>>>>
>>>> let $res := (for $t in [$coAuth, $noCoAuth]
>>>> limit 100
>>>> return $t)
>>>>
>>>> return $res
>>>> -----------------------------
>>>>
>>>> Once I execute the previous AQL, one node (different one in each run)
>>>> reaches *400%* cpu-load (4-cores) and swallows up all the available
>>>> memory
>>>> it can get.
>>>>
>>>>
>>>> For smaller result (e.g. limit 10000), it works fine.
>>>>
>>>>
>>>> Thanks and sorry for the long email.
>>>>
>>>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>
>>
>>
>


-- 

*Regards,*
Wail Alkowaileet

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message