apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <thomas.we...@gmail.com>
Subject Re: Kinesis Operator Help
Date Wed, 17 Feb 2016 04:29:35 GMT
Ram,

The recovery path, when under the application directory, will be
automatically copied to the new app directory when relaunch option is used.
This is how the previous instance data is available to the new app.

Thomas

On Tue, Feb 16, 2016 at 5:23 PM, Munagala Ramanath <ram@datatorrent.com>
wrote:

> Ah, I understand now.
>
> The path is set in
> IdempotentStorageManager.FSIdempotentStorageManager,setup() near line 146:
> appPath = new Path(context.getValue(DAG.APPLICATION_PATH) + Path.SEPARATOR
> + recoveryPath);
>
> You can try creating a new class that extends FSIdempotentStorageManager
> and override setup() to use a local property
> for the appPath and simply duplicate the rest of the code.
>
> Ram
>
> On Tue, Feb 16, 2016 at 3:59 PM, Jim <jim@facility.supplies> wrote:
>
>> Ram,
>>
>>
>>
>> I am not 100% fluent in the details of the base kinesis operator and how
>> it interacts with Hadoop (hence my posting); if it would support that, then
>> yes, you could.
>>
>>
>>
>> My goal is to make it so one can easily pick up where they left off
>> reading the Kinesis stream, regardless of if you kill the application and
>> re-launch it, etc., without needing to go out to the cli to do some
>> commands (because at some point some operator will forget and then we will
>> reprocess a bunch of transactions; that would not be good!
>>
>>
>>
>> Jim
>>
>>
>>
>> *From:* Munagala Ramanath [mailto:ram@datatorrent.com]
>> *Sent:* Tuesday, February 16, 2016 5:21 PM
>> *To:* users@apex.incubator.apache.org
>> *Subject:* Re: Kinesis Operator Help
>>
>>
>>
>> Why use the application id ? Could you generate and use a java.util.UUID
>> for example and save it in HDFS ?
>>
>>
>>
>> Ram
>>
>>
>>
>> On Tue, Feb 16, 2016 at 11:40 AM, Jim <jim@facility.supplies> wrote:
>>
>> Good morning,
>>
>>
>>
>> I am new to Apex, Hadoop and Yarn (nothing like tackling something new,
>> is there?).
>>
>>
>>
>> I have my first Apex apps working that are edi processors that read new
>> edi transactions from an Amazon Kinesis stream, look at the data, and
>> routes the edi data to an appropriate handler for processing (note the
>> operatorEs pushes the data to ElasticSearch for logging).  Here is a
>> diagram:
>>
>>
>>
>>
>>
>> Everything launches, and is working fine with the above diagram from the
>> edi router through the transaction operators.
>>
>>
>>
>> The final challenge I am having, being new to all of this, is that the
>> Kinesis operator, by default, stores it’s app id in into
>> IdempotentStorageManager (aka WindowDataManager) when it is launched, so if
>> the app it shutdown and restarted this same app id is used by default with
>> the checkpoint so you don’t reprocess the same records again when the
>> application is restarted.
>>
>>
>>
>> You can see this id immediately to the right of the Operations / apps in
>> gray lettering ‘application_1453741656046_0520’ in the image from the
>> datatorrent console below:
>>
>>
>>
>> [image: cid:image004.png@01D168BA.5FE56550]
>>
>>
>>
>> However, if you kill the application, and re-launch, this id changes, and
>> it starts reading from the Kinesis stream back from the beginning; and the
>> only way to restart it so it starts where it left off is using the cli as
>> follows:
>>
>>
>>
>> 1.)    Run ‘dtcli’ from the command line.
>>
>> 2.)    Run ‘launch -originalAppId “application_1453741656046_0520” <path
>> to .apa file>’
>>
>>
>>
>> This will launch the application using the same app id identified in the
>> console screen above.
>>
>>
>>
>> I want to make this easier, but need some experts help in tweaking this
>> so it works.
>>
>>
>>
>> I am thinking that there should be a way with Kinesis to:
>>
>>
>>
>> 1.)    Define in the properties, a Kinesis app id string value.
>>
>> 2.)    If this value is defined, it will use that, when launching the
>> application, to check if an Hadoop app id has already been assigned to that
>> identifier.
>>
>> 3.)    If that value is not yet stored in the database, it will launch
>> the app, creating a new app id, and store the app id under the identifier
>> key value.
>>
>> 4.)    Now if I kill the app, or install new software, it will always
>> pick up where it left off by using the identifier key value to retrieve and
>> assign the app id.
>>
>>
>>
>> Sounds simple, right?  J
>>
>>
>>
>> Can one of the experts out there help me figure this out as I don’t want
>> to reprocess already processed edi transactions?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Jim
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Jim
>>
>>
>> jim@facility.supplies (414) 760-7711
>> ------------------------------
>>
>> *The information contained in this communication, including any files or
>> attachments transmitted with it, may contain copyrighted information or
>> information that is confidential and exempt from disclosure under
>> applicable laws and regulations, is intended only for the use of the
>> recipient(s) named above, and may be legally privileged. If the reader of
>> this message is not the intended recipient, you are hereby notified that
>> any dissemination, distribution, or copying of this communication, or any
>> of its contents, files or attachments, is strictly prohibited. If you have
>> received this communication in error, please return it to the sender
>> immediately and delete the original message and any copy of it from your
>> computer system. If you have any questions concerning this message, please
>> contact the sender. *
>>
>>
>>
>
>

Mime
View raw message