asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Malarout, Namrata (398M-Affiliate)" <>
Subject Re: Internal error [NegativeArraySizeException]
Date Tue, 13 Oct 2015 16:41:44 GMT
Hi Michael,
That is a great idea which we were actually discussing here. I have cced my mentors on this
email. Hopefully we can start up a conversation and make it happen.
From: Michael Carey <<>>
Reply-To: "<>"
Date: Thursday, October 8, 2015 at 11:27 PM
To: "<>"
Subject: Re: Internal error [NegativeArraySizeException]

One thought might be for a few of us in AsterixDB-land to make a road trip up to JPL and give
an overview talk to any/all interested data folks there - and then get in a conference room
with you and your mentor and take a more top-down and in-person look at what you're wanting
to do (especially for the largeness-inducing array fields)?

On 10/7/15 2:29 PM, Ian Maxon wrote:
Hm, I see, that is interesting. What is the 'mask' field then? It looked like some sort of
array from my first glance, but I bet it's more than that. Is it something that could be split
up in some way? One thought is to have the metadata in one dataset, and the masks that are
split in some way in another.


On Wed, Oct 7, 2015 at 2:16 PM, Malarout, Namrata (398M-Affiliate) <<><>>
Hi Ian,

Thanks for getting back about this so quickly. The data I provided was a subset of the records
that we have. Similar to mask, we have about 4 or 5 other fields which are even bigger. Unfortunately
we can't filter them out. The data that you see other after filtering out mask is just the
metadata of the file. I have ingested just the metadata when I was familiarizing myself with
AsterixDB and as you said, it works just fine. But, the actual data on which we will be querying
is stored in these large objects.

From: Ian Maxon [<>]
Sent: Tuesday, October 06, 2015 7:36 PM
Subject: Re: Internal error [NegativeArraySizeException]

Hi Namrata,
First, I think the behavior you are experiencing is a bug, so we'll look into that. The load
fails because each row is really large, about 3MB, and somehow the sort operator doesn't deal
with this well.
However it may be good that we ran into this, because, while huge objects like this should
eventually be handled more gracefully in AsterixDB, they're viewed as being exceptional rather
than the norm. Hence the performance will not be as good when these types of big objects/fields
are accessed while mixed in with comparatively tiny data.
The field I see taking up almost all of the space in the object is the "mask" field. Is this
something that is actually needed? Or can it be filtered/projected out?

I've attached a version of the sample data where I cut out the "mask" field, this one seems
to load in just fine using the provided DDL.

[] new_nomask.adm<>


On Tue, Oct 6, 2015 at 10:37 AM, Ian Maxon <<>>
> Awesome, Thanks Namrata. I'll give this a close look later today.
> -Ian
> On Tue, Oct 6, 2015 at 10:24 AM, Malarout, Namrata (398M-Affiliate)
> <<>> wrote:
>> Hi Ian,
>> I just realized I didn¹t provide the DDL. Sorry about that. I¹ve kept it
>> really simple:
>> drop dataverse TestL4 if exists;
>> create dataverse TestL4;
>> use dataverse TestL4;
>> create type GlobL4Type as open {
>> fid: string,
>> }
>> create dataset GlobL4(GlobL4Type)
>> primary key fid;
>> Please let me know if you have any questions.
>> Thanks,
>> Namrata
>> On 10/1/15, 5:33 PM, "Ian Maxon" <<>>
>>>P.S., if you have the data/DDL/so on that caused this error to happen,
>>>I can try to reproduce here locally if the exception/logs may have
>>>gotten lost somewhere.
>>>On Thu, Oct 1, 2015 at 5:19 PM, Ian Maxon <<><>>
>>>> Hey Namrata,
>>>> Those logs are not logs in the diagnostic sense, but rather
>>>> write-ahead logs, so a log of the transactions that are occuring in
>>>> the instance. If you were using the single-machine package I gave you,
>>>> the error's stack trace should actually be on the console.
>>>> Thanks,
>>>> -Ian
>>>> On Thu, Oct 1, 2015 at 5:13 PM, Malarout, Namrata (398M-Affiliate)
>>>> <<><>>
>>>>> Hi,
>>>>> I got an error while trying to ingest data.
>>>>> Internal error. Please check instance logs for further details.
>>>>> [NegativeArraySizeException]
>>>>> I¹ve attached the logs. When I open them it¹s unreadable. The logs
>>>>> ClusterControllerService are empty (screenshot attached).
>>>>> I have had errors when I was using version 0.8.6 ingesting data due to
>>>>> size of the data. Has anyone encountered this error before?
>>>>> Thanks in advance for the help.
>>>>> Regards,
>>>>> Namrata

View raw message