apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniruddha Thombare <anirud...@datatorrent.com>
Subject Possibility of saving checkpoints on other distributed filesystems
Date Wed, 20 Jan 2016 06:43:46 GMT
Hi,

Is it possible to save checkpoints in any other highly available
distributed file systems (which maybe mounted directories across the
cluster) other than HDFS?
If yes, is it configurable?

AFAIK, there is no configurable option available to achieve that.
If that's the case, can we have that feature?

This is with the intention to recover the applications faster and do away
with HDFS's small files problem as described here:

http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
http://inquidia.com/news-and-info/working-small-files-hadoop-part-1

If we could save checkpoints in some other distributed file system (or even
a HA NAS box) geared for small files, we could achieve -

   - Better performance of NN & HDFS for the production usage (read:
   production data I/O & not temp files)
   - Faster application recovery in case of planned shutdown / unplanned
   restarts

Please, send your comments, suggestions or ideas.

Thanks,


Aniruddha

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message