apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: Database Output Operator Improvements
Date Thu, 17 Dec 2015 19:29:31 GMT
Yes that is true Chandni, and considering how slow HDFS is we should avoid
writing to it if we can.

It would be great if someone could pick up the ticket :).

On Thu, Dec 17, 2015 at 11:17 AM, Chandni Singh <chandni@datatorrent.com>
wrote:

> +1 for Tim's suggestion.
>
> Using reconciler employs always writing to HDFS and then read from that.
> Tim's suggestion is that we only write to hdfs when database connection is
> down. This is analogous to spooling.
>
> Chandni
>
> On Thu, Dec 17, 2015 at 11:13 AM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
> > Tim we have a pattern for this called Reconciler that Gaurav has also
> > mentioned. There are some examples for it in Malhar
> >
> > On Thu, Dec 17, 2015 at 9:47 AM, Timothy Farkas <tim@datatorrent.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > One of our users is outputting to Cassandra, but they want to handle a
> > > Cassandra failure or Cassandra down time gracefully from an output
> > > operator. Currently a lot of our database operators will just fail and
> > > redeploy continually until the database comes back. This is a bad idea
> > for
> > > a couple of reasons:
> > >
> > > 1 - We rely on buffer server spooling to prevent data loss. If the
> > database
> > > is down for a long time (several hours or a day) we may run out of
> space
> > to
> > > spool for buffer server since it spools to local disk, and data is
> purged
> > > only after a window is committed. Furthermore this buffer server
> problem
> > > will exist for all the Streaming Containers in the dag, not just the
> one
> > > immediately upstream from the output operator, since data is spooled to
> > > disk for all operators and only removed for windows once a window is
> > > committed.
> > >
> > > 2 - If there is another failure further upstream in the dag, upstream
> > > operators will be redeployed to a checkpoint less than or equal to the
> > > checkpoint of the database operator in the At leas once case. This
> could
> > > mean redoing several hours or a day worth of computation.
> > >
> > > We should support a mechanism to detect when the connection to a
> database
> > > is lost and then spool to hdfs using a WAL, and then write the contents
> > of
> > > the WAL into the database once it comes back online. This will save the
> > > local disk space of all the nodes used in the dag and allow it to be
> used
> > > for only the data being output to the output operator.
> > >
> > > Ticket here if anyone is interested in working on it:
> > >
> > > https://malhar.atlassian.net/browse/MLHR-1951
> > >
> > > Thanks,
> > > Tim
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message