Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@asterixdb.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CANQf6Dr1nRTWDY4GUPjYYcG94m54OXdZjgEvQx+i1wtf0nsePQ@mail.gmail.com>
References: 
 <0-11857465121765544286-2307673520383560858-asterixdb=googlecode.com@googlecode.com>
	<5559346D.8080206@ics.uci.edu>
	<CANQf6Dr1nRTWDY4GUPjYYcG94m54OXdZjgEvQx+i1wtf0nsePQ@mail.gmail.com>
Date: Sun, 17 May 2015 21:22:25 -0700
Message-ID: 
 <CAF+3TUotLPqFZz4KbRL+fm-CO=fXZK+b3X8rOLE5UAEdcAOfbg@mail.gmail.com>
Subject: Re: Issue 868 in asterixdb: About prefix merge policy behavior
From: Young-Seok Kim <kisskys@gmail.com>
To: "asterixdb-dev@googlegroups.com" <asterixdb-dev@googlegroups.com>
Cc: dev@asterixdb.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a11c345c08b6e4c0516538ad4

--001a11c345c08b6e4c0516538ad4
Content-Type: text/plain; charset=UTF-8

There should be proper flow control policies which considers applications'
needs.

Best,
Young-Seok

On Sun, May 17, 2015 at 8:25 PM, Chen Li <chenli@gmail.com> wrote:

> A general question is: if the speed of incoming records is higher than
> the merge and disk-flush speed, what should we do?  Can we reject the
> insertion requests?
>
> Chen
>
> On Sun, May 17, 2015 at 5:38 PM, Michael Carey <mjcarey@ics.uci.edu>
> wrote:
> > Moving this to the incubator dev list (just looking it over again).  Q -
> if
> > when a merge finishes there's a bigger backlog of components - will it
> > currently consider doing a more-ways merge?  (Instead of 5, if there are
> 13
> > sitting there when the 5-merge finishes - will a 13-merge be initiated?)
> > Just curious.  We do probably need to think about some sort of better
> flow
> > control here - the storage gate should presumably slow down admissions
> if it
> > can't keep up - have to ponder what that might mean.  (I have a better
> idea
> > of what it could mean for feeds than for normal inserts.)  One could
> argue
> > that an increasing backlog is a sign that we should be scaling out the
> > number of partitions for the dataset (future work but important work
> :-)).
> >
> > Cheers,
> > Mike
> >
> > On 4/15/15 2:33 PM, asterixdb@googlecode.com wrote:
> >
> > Status: New
> > Owner: ----
> > Labels: Type-Defect Priority-Medium
> >
> > New issue 868 by kiss...@gmail.com: About prefix merge policy behavior
> > https://code.google.com/p/asterixdb/issues/detail?id=868
> >
> > I describe how current prefix merge policy works based on the observation
> > from ingestion experiments.
> > Also, the similar observation was observed by Sattam as well.
> > The observed behavior seems a bit unexpected, so I post the observation
> here
> > to consider better merge policy and/or better lsm index design regarding
> > merge operations.
> >
> > The aqls used for the experiment are shown at the end of this writing.
> >
> > Prefix merge policy decides to merge disk components based on the
> following
> > conditions
> > 1.  Look at the candidate components for merging in oldest-first order.
> If
> > one exists, identify the prefix of the sequence of all such components
> for
> > which the sum of their sizes exceeds MaxMergableComponentSize.  Schedule
> a
> > merge of those components into a new component.
> > 2.  If a merge from 1 doesn't happen, see if the set of candidate
> components
> > for merging exceeds MaxToleranceComponentCnt.  If so, schedule a merge
> all
> > of the current candidates into a new single component.
> > Also, the prefix merge policy doesn't allow concurrent merge operations
> for
> > a single index partition.
> > In other words, if there is a scheduled or an on-going merge operation,
> even
> > if the above conditions are met, the merge operation is not scheduled.
> >
> > Based on this merge policy, the following situation can occur.
> > Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to
> disk.
> > When 5th disk component is flushed, the prefix merge policy schedules a
> > merge operation to merge the 5 components.
> > During the merge operation is scheduled and starts merging, concurrently
> > ingested records generates more disk components.
> > As long as a merge operation is not fast enough to catch up the speed of
> > generating 5 disk components by incoming ingested records,
> > the number of disk components increases as time goes.
> > So, the slower merge operations are, the more disk components there will
> be
> > as time goes.
> >
> > I also attached a result of a command, "ls -alR <directory of the
> asterixdb
> > instance for an ingestion experiment>" which was executed after the
> > ingestion is over.
> > The attached file shows that for primary index (whose directory is
> > FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk
> > components, where each disk component consists of btree (the filename has
> > suffix _b) and bloom filter (the filename has suffix_f) and
> > MaxMergableComponentSize is set to 1GB.
> > It also shows that for the secondary index (whose directory is
> > FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more than
> > 1400 components, where each disk component consist of a dictionary btree
> > (suffix: _b), an inverted list (suffix: _i), a deleted-key btree (suffix:
> > _d), and a bloom filter for the deleted-key btree (suffix: _f).
> > Even if the ingestion was over, since our merge operation happens
> > asynchronously, the merge operation continues and eventually merge all
> > mergable disk components according to the describe merge policy.
> >
> > ------------------------------------------
> > AQLs for the ingestion experiment
> > ------------------------------------------
> > drop dataverse STBench if exists;
> > create dataverse STBench;
> > use dataverse STBench;
> >
> > create type FsqCheckinTweetType as closed {
> >     id: int64,
> >     user_id: int64,
> >     user_followers_count: int64,
> >     text: string,
> >     datetime: datetime,
> >     coordinates: point,
> >     url: string?
> > }
> > create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
> >
> > /* this index type is only available kisskys/hilbertbtree branch.
> however,
> > you can easily replace sif index to inverted keyword index on the text
> field
> > and you will see similar behavior */
> > create index sifCoordinate on FsqCheckinTweet(coordinates) type
> sif(-180.0,
> > -90.0, 180.0, 90.0);
> >
> > /* create feed */
> > create feed  TweetFeed
> > using file_feed
> > (("fs"="localfs"),
> > ("path"="127.0.0.1
> :////Users/kisskys/Data/SynFsqCheckinTweet.adm"),("format"="adm"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));
> >
> > /* connect feed */
> > use dataverse STBench;
> > set wait-for-completion-feed "true";
> > connect feed TweetFeed to dataset FsqCheckinTweet;
> >
> >
> >
> >
> > Attachments:
> >     storage-layout.txt  574 KB
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "asterixdb-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to asterixdb-dev+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "asterixdb-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to asterixdb-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--001a11c345c08b6e4c0516538ad4--