apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogi Devendra <yogideven...@apache.org>
Subject Re: Possibility of saving checkpoints on other distributed filesystems
Date Tue, 02 Feb 2016 16:38:44 GMT
I would prefer to have an additional argument during application launch on
dtcli.

Say, --preserve-kill-state true .

Basically, platform should be able to do the clean-up activity if the
application is invoked with certain flag.

Test apps can set this flag to clear the data on kill. Production apps can
set this flag to keep the data on kill.

Shutdown should always preserve the state. But, for kill / forced-shutdown
user might prefer to clear the state.

~ Yogi

On 2 February 2016 at 21:53, Amol Kekre <amol@datatorrent.com> wrote:

>
> Can we include a script in our github (util?) that simply deletes these
> files upon application being killed, given an app-id. The admin will need
> to run this script. Auto-deleting will be bad as a lot of users, including
> those in production today need to restart using those files. The
> knowledge/desire to restart post failure is outside the app and hence
> technically the script should be explicitly user invoked
>
> Thks,
> Amol
>
>
> On Tue, Feb 2, 2016 at 6:12 AM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
>> Hi Venkat,
>>
>> There are typically a small number of outstanding checkpoint files per
>> operator, as newer checkpoints are created old ones are automatically
>> deleted by the application when it determines that state is no longer
>> needed. When an application stops/killed the last checkpoints remain.
>> There
>> is also a benefit to that since a new application can be restarted to
>> continue from those checkpoints instead of starting all the way from the
>> beginning and this is useful in some cases. But if you are always starting
>> your application from scratch yes you can delete the checkpoints of older
>> applications that are no longer running.
>>
>> Thanks
>>
>> On Mon, Feb 1, 2016 at 10:19 PM, Kottapalli, Venkatesh <
>> VKottapalli@directv.com> wrote:
>>
>> > Hi,
>> >
>> >         Now that this has been discussed, Will the checkpointed data be
>> > purged when we kill the application forcefully?  In our current usage,
>> we
>> > forcefully kill the app after it processes a certain batch of data. I
>> see
>> > these small files are created under (user/datatorrent) directory and not
>> > removed.
>> >
>> >         Another scenario, when some of the containers keep failing, we
>> > have observed this state where the data is continuously checkpointed
>> into
>> > small files. When we kill the app, the data will be there.
>> >
>> >         We have received concerns saying this is impacting namenode
>> > performance since these small files are stored in HDFS. So we manually
>> > remove these checkpointed data at regular intervals.
>> >
>> > -Venkatesh
>> >
>> > -----Original Message-----
>> > From: Amol Kekre [mailto:amol@datatorrent.com]
>> > Sent: Monday, February 01, 2016 7:49 AM
>> > To: dev@apex.incubator.apache.org; users@apex.incubator.apache.org
>> > Subject: Re: Possibility of saving checkpoints on other distributed
>> > filesystems
>> >
>> > Aniruddha,
>> > We have not heard this request from users yet. It may be because our
>> > checkpointing has a purge, i.e. the small files are not left over. Small
>> > file problem has been there in Hadoop and relates to storing small
>> files in
>> > Hadoop for a longer time (more likely forever).
>> >
>> > Thks,
>> > Amol
>> >
>> >
>> > On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
>> > aniruddha@datatorrent.com> wrote:
>> >
>> > > Hi Community,
>> > >
>> > > Or Let me say BigFoots, do you think this feature should be available?
>> > >
>> > > The reason to bring this up was discussed in the start of this thread
>> as:
>> > >
>> > > This is with the intention to recover the applications faster and do
>> > > away
>> > > > with HDFS's small files problem as described here:
>> > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
>> > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > If we could save checkpoints in some other distributed file system
>> > > > (or even a HA NAS box) geared for small files, we could achieve -
>> > > >
>> > > >    - Better performance of NN & HDFS for the production usage
(read:
>> > > >    production data I/O & not temp files)
>> > > >
>> > > >
>> > > >    - Faster application recovery in case of planned shutdown /
>> > unplanned
>> > > >    restarts
>> > > >
>> > > > If you feel the need of this feature, please cast your opinions and
>> > > > ideas
>> > > so that it can be converted in a jira.
>> > >
>> > >
>> > >
>> > > Thanks,
>> > >
>> > >
>> > > Aniruddha
>> > >
>> > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta
>> > > <gaurav@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Aniruddha,
>> > > >
>> > > > Currently we don't have any support for that.
>> > > >
>> > > > Thanks
>> > > > Gaurav
>> > > >
>> > > > Thanks
>> > > > -Gaurav
>> > > >
>> > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi
>> > > > <tushar@datatorrent.com>
>> > > > wrote:
>> > > >
>> > > > > Default FSStorageAgent can be used as it can work with local
>> > > filesystem,
>> > > > > but I far as I know there is no support for specifying the
>> > > > > directory through xml file. by default it use the application
>> > directory on HDFS.
>> > > > >
>> > > > > Not sure If we could specify storage agent with its properties
>> > > > > through
>> > > > the
>> > > > > configuration at dag level.
>> > > > >
>> > > > > - Tushar.
>> > > > >
>> > > > >
>> > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
>> > > > > aniruddha@datatorrent.com> wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Do we have any storage agent which I can use readily,
>> > > > > > configurable
>> > > > > through
>> > > > > > dt-site.xml?
>> > > > > >
>> > > > > > I am looking for something which would save checkpoints
in
>> > > > > > mounted
>> > > file
>> > > > > > system [eg. HA-NAS] which is basically just another directory
>> > > > > > for
>> > > Apex.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > >
>> > > > > > Aniruddha
>> > > > > >
>> > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
>> > > > sandesh@datatorrent.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > It is already supported refer the following jira for
more
>> > > > information,
>> > > > > > >
>> > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare
<
>> > > > > > > aniruddha@datatorrent.com> wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > Is it possible to save checkpoints in any other
highly
>> > > > > > > > available distributed file systems (which maybe
mounted
>> > > > > > > > directories across
>> > > > the
>> > > > > > > > cluster) other than HDFS?
>> > > > > > > > If yes, is it configurable?
>> > > > > > > >
>> > > > > > > > AFAIK, there is no configurable option available
to achieve
>> > that.
>> > > > > > > > If that's the case, can we have that feature?
>> > > > > > > >
>> > > > > > > > This is with the intention to recover the applications
>> > > > > > > > faster and
>> > > > do
>> > > > > > away
>> > > > > > > > with HDFS's small files problem as described here:
>> > > > > > > >
>> > > > > > > >
>> http://blog.cloudera.com/blog/2009/02/the-small-files-proble
>> > > > > > > > m/
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-smal
>> > > l-files-problem/
>> > > > > > > >
>> > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
>> > > > > > > >
>> > > > > > > > If we could save checkpoints in some other distributed
file
>> > > system
>> > > > > (or
>> > > > > > > even
>> > > > > > > > a HA NAS box) geared for small files, we could
achieve -
>> > > > > > > >
>> > > > > > > >    - Better performance of NN & HDFS for the
production
>> > > > > > > > usage
>> > > > (read:
>> > > > > > > >    production data I/O & not temp files)
>> > > > > > > >    - Faster application recovery in case of planned
shutdown
>> > > > > > > > /
>> > > > > > unplanned
>> > > > > > > >    restarts
>> > > > > > > >
>> > > > > > > > Please, send your comments, suggestions or ideas.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Aniruddha
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message