Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@apex.incubator.apache.org
References: 
 <CADS+zCtkt+eX4_ap9sG9E825OqV9kfN8iNrG0eVzseKU8-avJA@mail.gmail.com>
 <CAA7LveEZviDXcw0Xs=gjwyMt1=h6TjpFwaLq6imoRNV1HLa2tA@mail.gmail.com>
 <CAKaBFB=uDRA+R8B5qavbW2AK07sXd=nj9i0Z5izSNJMDnjOeVQ@mail.gmail.com>
 <CADS+zCsJFrHem+rhxcUvPnNSs-toiu6nqv3rfnYuv6CkVUYqhQ@mail.gmail.com>
 <CA+QE8bctx7jQf4kNMyFz7XR6G0QAjqQXo1wL+bqrX+0C-3HbdA@mail.gmail.com>
 <CAKaBFBmT1M5cDfmMNs4808CinLqj-CWE_kTTv2xgVsvH1NW-jA@mail.gmail.com>
From: Pramod Immaneni <pramod@datatorrent.com>
In-Reply-To: 
 <CAKaBFBmT1M5cDfmMNs4808CinLqj-CWE_kTTv2xgVsvH1NW-jA@mail.gmail.com>
Mime-Version: 1.0 (1.0)
Date: Thu, 17 Dec 2015 13:06:33 -0800
Message-ID: <7178413472941578920@unknownmsgid>
Subject: Re: Database Output Operator Improvements
To: dev@apex.incubator.apache.org
Content-Type: text/plain; charset=UTF-8

Why can't the reconciler as it exists today be enhanced to write more optimally.

> On Dec 17, 2015, at 12:07 PM, Chandni Singh <chandni@datatorrent.com> wrote:
>
> The question is with databases like HBase & Cassandra which are again
> backed by a FileSystem like HDFS why to write to HDFS when the database
> connection is healthy?
>
> These are distributed, scalable and performant databases.
>
> IMO reconciler approach isn't the best here. It fits the needs when the
> external entity is always slow which was the original use case.
> We can spool to HDFS when the connection is unhealthy.
>
> If this is properly implemented it can address all the other points which
> are mentioned by Ashwin.
>
> Also I think benchmarking of such solutions will help us in comparing and
> deciding which use case they fit best.
>
> Chandni
>
> On Thu, Dec 17, 2015 at 11:49 AM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> Tim,
>>
>> Are you saying HDFS is slower than a database? :)
>>
>> I think Reconciler is the best approach. The tuples need not be written to
>> hdfs, they can be queued in memory. You can spool them to hdfs only when it
>> reaches the limits of the queue. The reconciler solves a few major problems
>> as you described above.
>>
>> 1. Graceful reconnection. When the external system we are writing to is
>> down, the reconciler is spooling the messages to the queue and then to
>> hdfs. The tuples are written to the external system only after it is back
>> up again.
>> 2. Handling surges. There will be cases when the throughput may get a
>> sudden surge for some period and the external system may not be fast enough
>> for the writes to it. In those cases, by using reconciler, we are spooling
>> the incoming tuples to queue/hdfs and then writing at the pace of external
>> system.
>> 3. Dag slowdown. Again in case of external system failure or slow
>> connection, we do not want to block the windows moving forward. If the
>> windows are blocked for a long time, then stram will unnecessarily kill the
>> operator. Reconciler makes sure that the incoming messages are just
>> queued/spooled to hdfs (external system is not blocking the dag), so the
>> dag is not slowed down.
>>
>> Regards,
>> Ashwin.
>>
>> On Thu, Dec 17, 2015 at 11:29 AM, Timothy Farkas <tim@datatorrent.com>
>> wrote:
>>
>>> Yes that is true Chandni, and considering how slow HDFS is we should
>> avoid
>>> writing to it if we can.
>>>
>>> It would be great if someone could pick up the ticket :).
>>>
>>> On Thu, Dec 17, 2015 at 11:17 AM, Chandni Singh <chandni@datatorrent.com
>>>
>>> wrote:
>>>
>>>> +1 for Tim's suggestion.
>>>>
>>>> Using reconciler employs always writing to HDFS and then read from
>> that.
>>>> Tim's suggestion is that we only write to hdfs when database connection
>>> is
>>>> down. This is analogous to spooling.
>>>>
>>>> Chandni
>>>>
>>>> On Thu, Dec 17, 2015 at 11:13 AM, Pramod Immaneni <
>>> pramod@datatorrent.com>
>>>> wrote:
>>>>
>>>>> Tim we have a pattern for this called Reconciler that Gaurav has also
>>>>> mentioned. There are some examples for it in Malhar
>>>>>
>>>>> On Thu, Dec 17, 2015 at 9:47 AM, Timothy Farkas <tim@datatorrent.com
>>>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> One of our users is outputting to Cassandra, but they want to
>> handle
>>> a
>>>>>> Cassandra failure or Cassandra down time gracefully from an output
>>>>>> operator. Currently a lot of our database operators will just fail
>>> and
>>>>>> redeploy continually until the database comes back. This is a bad
>>> idea
>>>>> for
>>>>>> a couple of reasons:
>>>>>>
>>>>>> 1 - We rely on buffer server spooling to prevent data loss. If the
>>>>> database
>>>>>> is down for a long time (several hours or a day) we may run out of
>>>> space
>>>>> to
>>>>>> spool for buffer server since it spools to local disk, and data is
>>>> purged
>>>>>> only after a window is committed. Furthermore this buffer server
>>>> problem
>>>>>> will exist for all the Streaming Containers in the dag, not just
>> the
>>>> one
>>>>>> immediately upstream from the output operator, since data is
>> spooled
>>> to
>>>>>> disk for all operators and only removed for windows once a window
>> is
>>>>>> committed.
>>>>>>
>>>>>> 2 - If there is another failure further upstream in the dag,
>> upstream
>>>>>> operators will be redeployed to a checkpoint less than or equal to
>>> the
>>>>>> checkpoint of the database operator in the At leas once case. This
>>>> could
>>>>>> mean redoing several hours or a day worth of computation.
>>>>>>
>>>>>> We should support a mechanism to detect when the connection to a
>>>> database
>>>>>> is lost and then spool to hdfs using a WAL, and then write the
>>> contents
>>>>> of
>>>>>> the WAL into the database once it comes back online. This will save
>>> the
>>>>>> local disk space of all the nodes used in the dag and allow it to
>> be
>>>> used
>>>>>> for only the data being output to the output operator.
>>>>>>
>>>>>> Ticket here if anyone is interested in working on it:
>>>>>>
>>>>>> https://malhar.atlassian.net/browse/MLHR-1951
>>>>>>
>>>>>> Thanks,
>>>>>> Tim
>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>