asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Re: Creating RTree: no space left
Date Wed, 14 Sep 2016 04:58:13 GMT
@Mike: You filed an issue -
https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-)

Best,
Taewoo

On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey <dtabass@gmail.com> wrote:

> I can't remember (slight jetlag? :-)) if I shared back to this list one
> theory that came up in India when Wail and I talked F2F - his data has a
> lot of duplicate points, so maybe something goes awry in that case.  I
> wonder if we've sufficiently tested that case?  (E.g., what if there are
> gazillions of records originating from a small handful of points?)
>
>
> On 8/26/16 9:55 AM, Taewoo Kim wrote:
>
>> Based on a rough calculation, per partition, each point field takes 3.6GB
>> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB, we are
>> generating 625 files (96MB or 128MB each) = 157GB. Since Wail mentioned
>> that there was no issue when creating a B+ tree index, we need to check
>> what SORT process is required by R-Tree index.
>>
>> Best,
>> Taewoo
>>
>> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <jianfeng.jia@gmail.com>
>> wrote:
>>
>> If all of the file names start with “ExternalSortRunGenerator”, then they
>>> are the first round files which can not be GCed.
>>> Could you provide the query plan as well?
>>>
>>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet <wael.y.k@gmail.com>
>>>>
>>> wrote:
>>>
>>>> Hi Ian and Pouria,
>>>>
>>>> The name of the files along with the sizes (there were 625 one of those
>>>> before crashing):
>>>>
>>>> size        name
>>>> 96MB     ExternalSortRunGenerator8917133039835449370.waf
>>>> 128MB   ExternalSortRunGenerator8948724728025392343.waf
>>>>
>>>> no files were generated beyond runs.
>>>> compiler.sortmemory = 64MB
>>>>
>>>> Here is the full logs
>>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_
>>>>
>>> 25_07%3A34%3A52_AST_2016.zip?dl=0>
>>>
>>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh <
>>>>
>>> pouria.pirzadeh@gmail.com>
>>>
>>>> wrote:
>>>>
>>>> We previously had issues with huge spilled sort temp files when creating
>>>>> inverted index for fuzzy queries, but NOT R-Trees.
>>>>> I also recall that Yingyi fixed the issue of delaying clean-up for
>>>>> intermediate temp files until the end of the query execution.
>>>>> If you can share names of a couple of temp files (and their sizes along
>>>>> with the sort memory setting you have in asterix-configuration.xml) we
>>>>>
>>>> may
>>>
>>>> be able to have a better guess as if the sort is really going into a
>>>>> two-level merge or not.
>>>>>
>>>>> Pouria
>>>>>
>>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian Maxon <imaxon@uci.edu> wrote:
>>>>>
>>>>> I think that execption ("No space left on device") is just casted from
>>>>>>
>>>>> the
>>>>>
>>>>>> native IOException. Therefore I would be inclined to believe it's
>>>>>>
>>>>> genuinely
>>>>>
>>>>>> out of space. I suppose the question is why the external sort is
so
>>>>>>
>>>>> huge.
>>>
>>>> What is the query plan? Maybe that will shed light on a possible cause.
>>>>>>
>>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet <wael.y.k@gmail.com
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>> I was monitoring Inodes ... it didn't go beyond 1%.
>>>>>>>
>>>>>>> On Tue, Aug 23, 2016 at 7:58 PM, Wail Alkowaileet <
>>>>>>> wael.y.k@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Chris and Mike,
>>>>>>>>
>>>>>>>> Actually I was monitoring it to see what's going on:
>>>>>>>>
>>>>>>>>    - The size of each partition is about 40GB (80GB in total
per
>>>>>>>>    iodevice).
>>>>>>>>    - The runs took 157GB per iodevice (about 2x of the dataset
>>>>>>>> size).
>>>>>>>>    Each run takes either of 128MB or 96MB of storage.
>>>>>>>>    - At a certain time, there were 522 runs.
>>>>>>>>
>>>>>>>> I even tried to create a BTree Index to see if that happens
as well.
>>>>>>>>
>>>>>>> I
>>>>>
>>>>>> created two BTree indexes one for the *location* and one for the
>>>>>>>>
>>>>>>> *caller
>>>>>>
>>>>>>> *and
>>>>>>>
>>>>>>>> they were created successfully. The sizes of the runs didn't
take
>>>>>>>>
>>>>>>> anyway
>>>>>>
>>>>>>> near that.
>>>>>>>>
>>>>>>>> Logs are attached.
>>>>>>>>
>>>>>>>> On Tue, Aug 23, 2016 at 7:19 PM, Mike Carey <dtabass@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>> I think we might have "file GC issues" - I vaguely remember that
we
>>>>>>>>>
>>>>>>>> don't
>>>>>>>
>>>>>>>> (or at least didn't once upon a time) proactively remove
unnecessary
>>>>>>>>>
>>>>>>>> run
>>>>>>
>>>>>>> files - removing all of them at end-of-job instead of at the
end of
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> execution phase that uses their contents.  We may also have an
>>>>>>>>>
>>>>>>>> "Amdahl
>>>>>
>>>>>> problem" right now with our sort since we serialize phase two of
>>>>>>>>>
>>>>>>>> parallel
>>>>>>>
>>>>>>>> sorts - though this is not a query, it's index build, so
that
>>>>>>>>>
>>>>>>>> shouldn't
>>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>> it.  It would be interesting to put a df/sleep script on
each of the
>>>>>>>>>
>>>>>>>> nodes
>>>>>>>
>>>>>>>> when this is happening - actually a script that monitors
the temp
>>>>>>>>>
>>>>>>>> file
>>>>>
>>>>>> directory - and watch the lifecycle happen and the sizes change....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/23/16 2:06 AM, Chris Hillery wrote:
>>>>>>>>>
>>>>>>>>> When you get the "disk full" warning, do a quick "df
-i" on the
>>>>>>>>>>
>>>>>>>>> device
>>>>>>
>>>>>>> -
>>>>>>>
>>>>>>>> possibly you've run out of inodes even if the space isn't
all used
>>>>>>>>>>
>>>>>>>>> up.
>>>>>>
>>>>>>> It's
>>>>>>>>>> unlikely because I don't think AsterixDB creates
a bunch of small
>>>>>>>>>>
>>>>>>>>> files,
>>>>>>>
>>>>>>>> but worth checking.
>>>>>>>>>>
>>>>>>>>>> If that's not it, then can you share the full exception
and stack
>>>>>>>>>>
>>>>>>>>> trace?
>>>>>>>
>>>>>>>> Ceej
>>>>>>>>>> aka Chris Hillery
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 23, 2016 at 1:59 AM, Wail Alkowaileet
<
>>>>>>>>>>
>>>>>>>>> wael.y.k@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I just cleared the hard drives to get 80% free space.
I still get
>>>>>>>>>>
>>>>>>>>> the
>>>>>
>>>>>> same
>>>>>>>>>>> issue.
>>>>>>>>>>>
>>>>>>>>>>> The data contains:
>>>>>>>>>>> 1- 2887453794 records.
>>>>>>>>>>> 2- Schema:
>>>>>>>>>>>
>>>>>>>>>>> create type CDRType as {
>>>>>>>>>>>
>>>>>>>>>>> id:uuid,
>>>>>>>>>>>
>>>>>>>>>>> 'date':string,
>>>>>>>>>>>
>>>>>>>>>>> 'time':string,
>>>>>>>>>>>
>>>>>>>>>>> 'duration':int64,
>>>>>>>>>>>
>>>>>>>>>>> 'caller':int64,
>>>>>>>>>>>
>>>>>>>>>>> 'callee':int64,
>>>>>>>>>>>
>>>>>>>>>>> location:point?
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 23, 2016 at 9:06 AM, Wail Alkowaileet
<
>>>>>>>>>>>
>>>>>>>>>> wael.y.k@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dears,
>>>>>>>>>>>
>>>>>>>>>>>> I have a dataset of size 290GB loaded in
a 3 NCs each of which
>>>>>>>>>>>>
>>>>>>>>>>> has
>>>>>
>>>>>> 2x500GB
>>>>>>>>>>>
>>>>>>>>>>> SSD.
>>>>>>>>>>>>
>>>>>>>>>>>> Each of NC has two IODevices (partitions)
in each hard drive
>>>>>>>>>>>> (i.e
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>
>>>>>>> total is 4 iodevices per NC). After loading the data, each
>>>>>>>>>>>>
>>>>>>>>>>> Asterix
>>>>>
>>>>>> partition occupied 31GB.
>>>>>>>>>>>>
>>>>>>>>>>>> The cluster has about 50% free space in each
hard drive
>>>>>>>>>>>>
>>>>>>>>>>> (approximately
>>>>>>>
>>>>>>>> about 250GB free space in each hard drive). However, when
I tried
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>
>>>>>>> create
>>>>>>>>>>>
>>>>>>>>>>> an index of type RTree, I got an exception that
no space left in
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>
>>>>>>> hard
>>>>>>>>>>>> drive during the External Sort phase.
>>>>>>>>>>>>
>>>>>>>>>>>> Is that normal ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> *Regards,*
>>>>>>>>>>>> Wail Alkowaileet
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *Regards,*
>>>>>>>>>>> Wail Alkowaileet
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Regards,*
>>>>>>>> Wail Alkowaileet
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Regards,*
>>>>>>> Wail Alkowaileet
>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>>
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>>
>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message