asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Creating RTree: no space left
Date Fri, 26 Aug 2016 14:16:57 GMT
(Seems like one that could very well pop up at SDSC in the future, too.)


On 8/24/16 10:02 PM, Wail Alkowaileet wrote:
> Hi Ian and Pouria,
>
> The name of the files along with the sizes (there were 625 one of those
> before crashing):
>
> size        name
> 96MB     ExternalSortRunGenerator8917133039835449370.waf
> 128MB   ExternalSortRunGenerator8948724728025392343.waf
>
> no files were generated beyond runs.
> compiler.sortmemory = 64MB
>
> Here is the full logs
> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_25_07%3A34%3A52_AST_2016.zip?dl=0>
>
> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <pouria.pirzadeh@gmail.com>
> wrote:
>
>> We previously had issues with huge spilled sort temp files when creating
>> inverted index for fuzzy queries, but NOT R-Trees.
>> I also recall that Yingyi fixed the issue of delaying clean-up for
>> intermediate temp files until the end of the query execution.
>> If you can share names of a couple of temp files (and their sizes along
>> with the sort memory setting you have in asterix-configuration.xml) we may
>> be able to have a better guess as if the sort is really going into a
>> two-level merge or not.
>>
>> Pouria
>>
>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <imaxon@uci.edu> wrote:
>>
>>> I think that execption ("No space left on device") is just casted from
>> the
>>> native IOException. Therefore I would be inclined to believe it's
>> genuinely
>>> out of space. I suppose the question is why the external sort is so huge.
>>> What is the query plan? Maybe that will shed light on a possible cause.
>>>
>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <wael.y.k@gmail.com>
>>> wrote:
>>>
>>>> I was monitoring Inodes ... it didn't go beyond 1%.
>>>>
>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <wael.y.k@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Chris and Mike,
>>>>>
>>>>> Actually I was monitoring it to see what's going on:
>>>>>
>>>>>     - The size of each partition is about 40GB (80GB in total per
>>>>>     iodevice).
>>>>>     - The runs took 157GB per iodevice (about 2x of the dataset size).
>>>>>     Each run takes either of 128MB or 96MB of storage.
>>>>>     - At a certain time, there were 522 runs.
>>>>>
>>>>> I even tried to create a BTree Index to see if that happens as well.
>> I
>>>>> created two BTree indexes one for the *location* and one for the
>>> *caller
>>>> *and
>>>>> they were created successfully. The sizes of the runs didn't take
>>> anyway
>>>>> near that.
>>>>>
>>>>> Logs are attached.
>>>>>
>>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <dtabass@gmail.com>
>> wrote:
>>>>>> I think we might have "file GC issues" - I vaguely remember that
we
>>>> don't
>>>>>> (or at least didn't once upon a time) proactively remove unnecessary
>>> run
>>>>>> files - removing all of them at end-of-job instead of at the end
of
>>> the
>>>>>> execution phase that uses their contents.  We may also have an
>> "Amdahl
>>>>>> problem" right now with our sort since we serialize phase two of
>>>> parallel
>>>>>> sorts - though this is not a query, it's index build, so that
>>> shouldn't
>>>> be
>>>>>> it.  It would be interesting to put a df/sleep script on each of
the
>>>> nodes
>>>>>> when this is happening - actually a script that monitors the temp
>> file
>>>>>> directory - and watch the lifecycle happen and the sizes change....
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote:
>>>>>>
>>>>>>> When you get the "disk full" warning, do a quick "df -i" on the
>>> device
>>>> -
>>>>>>> possibly you've run out of inodes even if the space isn't all
used
>>> up.
>>>>>>> It's
>>>>>>> unlikely because I don't think AsterixDB creates a bunch of small
>>>> files,
>>>>>>> but worth checking.
>>>>>>>
>>>>>>> If that's not it, then can you share the full exception and stack
>>>> trace?
>>>>>>> Ceej
>>>>>>> aka Chris Hillery
>>>>>>>
>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet <
>>> wael.y.k@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I just cleared the hard drives to get 80% free space. I still
get
>> the
>>>>>>>> same
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> The data contains:
>>>>>>>> 1- 2887453794 records.
>>>>>>>> 2- Schema:
>>>>>>>>
>>>>>>>> create type CDRType as {
>>>>>>>>
>>>>>>>> id:uuid,
>>>>>>>>
>>>>>>>> 'date':string,
>>>>>>>>
>>>>>>>> 'time':string,
>>>>>>>>
>>>>>>>> 'duration':int64,
>>>>>>>>
>>>>>>>> 'caller':int64,
>>>>>>>>
>>>>>>>> 'callee':int64,
>>>>>>>>
>>>>>>>> location:point?
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet <
>>> wael.y.k@gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Dears,
>>>>>>>>> I have a dataset of size 290GB loaded in a 3 NCs each
of which
>> has
>>>>>>>> 2x500GB
>>>>>>>>
>>>>>>>>> SSD.
>>>>>>>>>
>>>>>>>>> Each of NC has two IODevices (partitions) in each hard
drive (i.e
>>> the
>>>>>>>>> total is 4 iodevices per NC). After loading the data,
each
>> Asterix
>>>>>>>>> partition occupied 31GB.
>>>>>>>>>
>>>>>>>>> The cluster has about 50% free space in each hard drive
>>>> (approximately
>>>>>>>>> about 250GB free space in each hard drive). However,
when I tried
>>> to
>>>>>>>> create
>>>>>>>>
>>>>>>>>> an index of type RTree, I got an exception that no space
left in
>>> the
>>>>>>>>> hard
>>>>>>>>> drive during the External Sort phase.
>>>>>>>>>
>>>>>>>>> Is that normal ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Regards,*
>>>>>>>>> Wail Alkowaileet
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Regards,*
>>>>>>>> Wail Alkowaileet
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Regards,*
>>>>> Wail Alkowaileet
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>>
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message