asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurram Faraaz <khfaraa...@gmail.com>
Subject Re: Creating RTree: no space left
Date Fri, 16 Sep 2016 05:13:07 GMT
@Pouria here is Uber trip data

https://github.com/fivethirtyeight/uber-tlc-foil-response

On Sep 16, 2016 1:21 AM, "Chen Li" <chenli@gmail.com> wrote:

> @Wail: as a use case related to selectivity, our current Cloudberry
> prototype doesn't benefit from R-tree when the user is analyzing the data
> for the entire US.  But we expect to have R-tree benefits when a user zooms
> into a small region.
>
> On Thu, Sep 15, 2016 at 8:25 AM, Wail Alkowaileet <wael.y.k@gmail.com>
> wrote:
>
> > Hi Ahmed and Mike,
> >
> > @Ahmed
> > I actually did a small experiment where I loaded about 1/5 of the data
> (so
> > I can index it) and seems that the R-Tree was really useful for querying
> > small regions or neighborhoods.
> > I also tried the B-Tree and it was slower than a full scan.
> >
> > @Mike
> > Unfortunately, I cannot still even after anonymization :-)
> >
> >
> > On Wed, Sep 14, 2016 at 11:29 PM, Mike Carey <dtabass@gmail.com> wrote:
> >
> > > Interesting point, so to speak.  @Wail, any chance you could post a
> > Google
> > > maps screenshot showing a visualization of the points in this dataset
> on
> > > the underlying geographic region?  (If the dataset is shareable in that
> > > anonymized form?)  I would think an R-tree would still be good for
> > > small-region geo queries - possibly shrinking the candidate object set
> > by a
> > > factor of 10,000 - so still useful - and we also do index-AND-ing now,
> so
> > > we would also combine that shrinkage by other index-provided shrinkage
> on
> > > any other index-amenable predicates.  I think the queries are still
> > spatial
> > > in nature, and the only AsterixDB choices for that are R-tree.  (We did
> > > experiments with things like Hilbert B-trees, but the results led to
> the
> > > conclusion that the code base only needs R-trees for spatial data for
> the
> > > forseeable future - they just work too well and in a no-tuning-required
> > > fashion.... :-))
> > >
> > >
> > >
> > > On 9/14/16 12:49 PM, Ahmed Eldawy wrote:
> > >
> > >> Looks like an interesting case. Just a small question. Are you sure a
> > >> spatial index is the right one to use here? The spatial attribute
> looks
> > >> more like a categorization and a hash or B-tree index could be more
> > >> suitable. As far as I know, the spatial index in AsterixDB is a
> > secondary
> > >> R-tree index which, like any other secondary index, is only good for
> > >> retrieving a small number of records. For this dataset, it seems that
> > any
> > >> small range would still return a huge number of records.
> > >>
> > >> It is still interesting to further investigate and fix the sort issue
> > but
> > >> I
> > >> mentioned the usage issue for a different perspective.
> > >>
> > >> Thanks
> > >> Ahmed
> > >>
> > >> On Wed, Sep 14, 2016 at 10:30 AM Mike Carey <dtabass@gmail.com>
> wrote:
> > >>
> > >> ☺!
> > >>>
> > >>> On Sep 14, 2016 1:11 AM, "Wail Alkowaileet" <wael.y.k@gmail.com>
> > wrote:
> > >>>
> > >>> To be exact
> > >>>> I have 2,255,091,590 records and 10,391 points :-)
> > >>>>
> > >>>> On Wed, Sep 14, 2016 at 10:46 AM, Mike Carey <dtabass@gmail.com>
> > wrote:
> > >>>>
> > >>>> Thx!  I knew I'd meant to "activate" the thought somehow, but
> couldn't
> > >>>>> remember having done it for sure.  Oops! Scattered from VLDB,
I
> > >>>>>
> > >>>> guess...!
> > >>>
> > >>>>
> > >>>>>
> > >>>>> On 9/13/16 9:58 PM, Taewoo Kim wrote:
> > >>>>>
> > >>>>> @Mike: You filed an issue -
> > >>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1639. :-)
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Taewoo
> > >>>>>>
> > >>>>>> On Tue, Sep 13, 2016 at 9:28 PM, Mike Carey <dtabass@gmail.com>
> > >>>>>>
> > >>>>> wrote:
> > >>>
> > >>>> I can't remember (slight jetlag? :-)) if I shared back to this
list
> > >>>>>>
> > >>>>> one
> > >>>
> > >>>> theory that came up in India when Wail and I talked F2F - his data
> > >>>>>>>
> > >>>>>> has
> > >>>
> > >>>> a
> > >>>>
> > >>>>> lot of duplicate points, so maybe something goes awry in that
case.
> > >>>>>>>
> > >>>>>> I
> > >>>
> > >>>> wonder if we've sufficiently tested that case?  (E.g., what if
there
> > >>>>>>>
> > >>>>>> are
> > >>>>
> > >>>>> gazillions of records originating from a small handful of points?)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 8/26/16 9:55 AM, Taewoo Kim wrote:
> > >>>>>>>
> > >>>>>>> Based on a rough calculation, per partition, each point
field
> takes
> > >>>>>>>
> > >>>>>> 3.6GB
> > >>>>
> > >>>>> (16 bytes * 2887453794 records / 12 partition). To sort 3.6GB,
we
> > >>>>>>>>
> > >>>>>>> are
> > >>>
> > >>>> generating 625 files (96MB or 128MB each) = 157GB. Since Wail
> > >>>>>>>>
> > >>>>>>> mentioned
> > >>>>
> > >>>>> that there was no issue when creating a B+ tree index, we need
to
> > >>>>>>>>
> > >>>>>>> check
> > >>>>
> > >>>>> what SORT process is required by R-Tree index.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Taewoo
> > >>>>>>>>
> > >>>>>>>> On Fri, Aug 26, 2016 at 7:52 AM, Jianfeng Jia <
> > >>>>>>>>
> > >>>>>>> jianfeng.jia@gmail.com
> > >>>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>> If all of the file names start with “ExternalSortRunGenerator”,
> > then
> > >>>>>>>> they
> > >>>>>>>>
> > >>>>>>>> are the first round files which can not be GCed.
> > >>>>>>>>> Could you provide the query plan as well?
> > >>>>>>>>>
> > >>>>>>>>> On Aug 24, 2016, at 10:02 PM, Wail Alkowaileet
<
> > wael.y.k@gmail.com
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Ian and Pouria,
> > >>>>>>>>>
> > >>>>>>>>>> The name of the files along with the sizes
(there were 625 one
> > of
> > >>>>>>>>>> those
> > >>>>>>>>>> before crashing):
> > >>>>>>>>>>
> > >>>>>>>>>> size        name
> > >>>>>>>>>> 96MB     ExternalSortRunGenerator8917133039835449370.waf
> > >>>>>>>>>> 128MB   ExternalSortRunGenerator8948724728025392343.waf
> > >>>>>>>>>>
> > >>>>>>>>>> no files were generated beyond runs.
> > >>>>>>>>>> compiler.sortmemory = 64MB
> > >>>>>>>>>>
> > >>>>>>>>>> Here is the full logs
> > >>>>>>>>>> <https://www.dropbox.com/s/k2qbo3wybc8mnnk/log_Thu_Aug_
> > >>>>>>>>>>
> > >>>>>>>>>> 25_07%3A34%3A52_AST_2016.zip?dl=0>
> > >>>>>>>>>>
> > >>>>>>>>> On Tue, Aug 23, 2016 at 9:29 PM, Pouria Pirzadeh
<
> > >>>>>>>>>
> > >>>>>>>>>> pouria.pirzadeh@gmail.com>
> > >>>>>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> We previously had issues with huge spilled
sort temp files
> when
> > >>>>>>>>>> creating
> > >>>>>>>>>>
> > >>>>>>>>>> inverted index for fuzzy queries, but NOT
R-Trees.
> > >>>>>>>>>>> I also recall that Yingyi fixed the
issue of delaying
> clean-up
> > >>>>>>>>>>>
> > >>>>>>>>>> for
> > >>>
> > >>>> intermediate temp files until the end of the query execution.
> > >>>>>>>>>>> If you can share names of a couple
of temp files (and their
> > sizes
> > >>>>>>>>>>> along
> > >>>>>>>>>>> with the sort memory setting you have
in
> > >>>>>>>>>>>
> > >>>>>>>>>> asterix-configuration.xml)
> > >>>
> > >>>> we
> > >>>>>>>>>>>
> > >>>>>>>>>>> may
> > >>>>>>>>>>>
> > >>>>>>>>>> be able to have a better guess as if the
sort is really going
> > >>>>>>>>>>
> > >>>>>>>>> into a
> > >>>
> > >>>> two-level merge or not.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Pouria
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Aug 23, 2016 at 11:09 AM, Ian
Maxon <imaxon@uci.edu>
> > >>>>>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>
> > >>>>> I think that execption ("No space left on device") is just
casted
> > >>>>>>>>>>> from
> > >>>>>>>>>>> the
> > >>>>>>>>>>>
> > >>>>>>>>>>> native IOException. Therefore I would
be inclined to believe
> > it's
> > >>>>>>>>>>>
> > >>>>>>>>>>>> genuinely
> > >>>>>>>>>>>>
> > >>>>>>>>>>> out of space. I suppose the question
is why the external sort
> > is
> > >>>>>>>>>>>
> > >>>>>>>>>> so
> > >>>
> > >>>> huge.
> > >>>>>>>>>>>>
> > >>>>>>>>>>> What is the query plan? Maybe that
will shed light on a
> > possible
> > >>>>>>>>>> cause.
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Aug 23, 2016 at 9:59 AM, Wail Alkowaileet
<
> > >>>>>>>>>>>
> > >>>>>>>>>>>> wael.y.k@gmail.com
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I was monitoring Inodes ... it
didn't go beyond 1%.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Aug 23, 2016 at 7:58 PM,
Wail Alkowaileet <
> > >>>>>>>>>>>>> wael.y.k@gmail.com
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Chris and Mike,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Actually I was monitoring it
to see what's going on:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>      - The size of each
partition is about 40GB (80GB in
> > total
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> per
> > >>>>
> > >>>>>      iodevice).
> > >>>>>>>>>>>>>>      - The runs took 157GB
per iodevice (about 2x of the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> dataset
> > >>>
> > >>>> size).
> > >>>>>>>>>>>>>>      Each run takes either
of 128MB or 96MB of storage.
> > >>>>>>>>>>>>>>      - At a certain time,
there were 522 runs.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I even tried to create
a BTree Index to see if that
> happens
> > as
> > >>>>>>>>>>>>>> well.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> created two BTree indexes one
for the *location* and one
> for
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *caller
> > >>>>>>>>>>>>> *and
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> they were created successfully.
The sizes of the runs
> didn't
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> take
> > >>>
> > >>>> anyway
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> near that.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Logs are attached.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Aug 23, 2016 at
7:19 PM, Mike Carey <
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> dtabass@gmail.com>
> > >>>
> > >>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> I think we might have "file
GC issues" - I vaguely remember
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>
> > >>>>>>>>>>> we
> > >>>>
> > >>>>> don't
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> (or at least didn't once
upon a time) proactively remove
> > >>>>>>>>>>>>>> unnecessary
> > >>>>>>>>>>>>>> run
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> files - removing all of
them at end-of-job instead of at
> the
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> end
> > >>>
> > >>>> of
> > >>>>
> > >>>>> the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> execution phase that uses
their contents.  We may also
> have
> > an
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> "Amdahl
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> problem" right now with
our sort since we serialize phase
> > two
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> of
> > >>>
> > >>>> parallel
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> sorts - though this is
not a query, it's index build, so
> > that
> > >>>>>>>>>>>>>> shouldn't
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> it.  It would be interesting
to put a df/sleep script on
> each
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> of
> > >>>
> > >>>> the
> > >>>>>>>>>>>>>> nodes
> > >>>>>>>>>>>>>> when this is happening
- actually a script that monitors
> the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> temp
> > >>>>
> > >>>>> file
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> directory - and watch the
lifecycle happen and the sizes
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> change....
> > >>>>
> > >>>>> On 8/23/16 2:06 AM, Chris Hillery wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> When you get the "disk
full" warning, do a quick "df -i"
> on
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> the
> > >>>
> > >>>> device
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> -
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> possibly you've run out of
inodes even if the space isn't
> all
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> used
> > >>>>
> > >>>>> up.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> It's
> > >>>>>>>>>>>>>> unlikely because I don't
think AsterixDB creates a bunch
> of
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> small
> > >>>>
> > >>>>> files,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> but worth checking.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> If that's not it, then
can you share the full exception
> and
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> stack
> > >>>>
> > >>>>> trace?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Ceej
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> aka Chris Hillery
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Tue, Aug 23,
2016 at 1:59 AM, Wail Alkowaileet <
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wael.y.k@gmail.com>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>> I just cleared the hard
drives to get 80% free space. I
> > still
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> get
> > >>>>
> > >>>>> the
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> issue.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The data contains:
> > >>>>>>>>>>>>>>>>> 1- 2887453794
records.
> > >>>>>>>>>>>>>>>>> 2- Schema:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> create type
CDRType as {
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> id:uuid,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 'date':string,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 'time':string,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 'duration':int64,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 'caller':int64,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 'callee':int64,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> location:point?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> }
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, Aug
23, 2016 at 9:06 AM, Wail Alkowaileet <
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wael.y.k@gmail.com
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Dears,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I have a dataset of
size 290GB loaded in a 3 NCs each of
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> which
> > >>>>
> > >>>>> has
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 2x500GB
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> SSD.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Each of NC has two IODevices
(partitions) in each hard
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> drive
> > >>>
> > >>>> (i.e
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> total is 4
iodevices per NC). After loading the data,
> > each
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Asterix
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> partition occupied
31GB.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The cluster has about
50% free space in each hard drive
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> (approximately
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> about 250GB
free space in each hard drive). However,
> > when I
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> tried
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> create
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> an index of type RTree,
I got an exception that no space
> > left
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> in
> > >>>
> > >>>> the
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> hard
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> drive during the External
Sort phase.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Is that normal ?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> *Regards,*
> > >>>>>>>>>>>>>>>>>> Wail Alkowaileet
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> *Regards,*
> > >>>>>>>>>>>>>>>>> Wail Alkowaileet
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> *Regards,*
> > >>>>>>>>>>>>>> Wail Alkowaileet
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> *Regards,*
> > >>>>>>>>>>>>> Wail Alkowaileet
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> *Regards,*
> > >>>>>>>>>> Wail Alkowaileet
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Jianfeng Jia
> > >>>>>>>>> PhD Candidate of Computer Science
> > >>>>>>>>> University of California, Irvine
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>> --
> > >>>>
> > >>>> *Regards,*
> > >>>> Wail Alkowaileet
> > >>>>
> > >>>>
> > >
> >
> >
> > --
> >
> > *Regards,*
> > Wail Alkowaileet
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message