Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A5AB6200CD7 for ; Tue, 18 Jul 2017 05:33:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A3E221636B0; Tue, 18 Jul 2017 03:33:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9CB3B1636AE for ; Tue, 18 Jul 2017 05:33:50 +0200 (CEST) Received: (qmail 37844 invoked by uid 500); 18 Jul 2017 03:33:49 -0000 Mailing-List: contact dev-help@airavata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airavata.apache.org Delivered-To: mailing list dev@airavata.apache.org Received: (qmail 37825 invoked by uid 99); 18 Jul 2017 03:33:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2017 03:33:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A0714C1AD7 for ; Tue, 18 Jul 2017 03:33:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.899 X-Spam-Level: X-Spam-Status: No, score=-0.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_MSPIKE_H2=-2.8, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yhancok2AYWe for ; Tue, 18 Jul 2017 03:33:39 +0000 (UTC) Received: from mail-wr0-f175.google.com (mail-wr0-f175.google.com [209.85.128.175]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7A0DD5F6C3 for ; Tue, 18 Jul 2017 03:33:38 +0000 (UTC) Received: by mail-wr0-f175.google.com with SMTP id w4so9557087wrb.2 for ; Mon, 17 Jul 2017 20:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=EjpmYPV3q8VOB2OvwFSUIPHR4a+6GT9d0UAt0Wy2Qm4=; b=g6Y6R7ZB16mPb3y61SIP8D6VSS4X2nOFZj20VFTAabZraQhM5xS4iZEqDINAd6kr1E zyAL0JLc8mcraqe7TN5KBpEuzX0mogYJqsBMFk0Rpq6TNhucoA77S8pEvw1HWQANbDFh NdekeDjLJD2aG5EIS4sSnSVS8fbssLKSwNg7QbezIufNTBwrEy/Hlh42mfBsRK8F+ZBl OaKYy2ei0DUFDroqw4AkHJh2s90XUf+0W1uHGBw6qzNZvCqWHlAfsLPgdQ3gp/39EapG h23VMDxNHzyMiL0fkK3V/7dP/27EwzSOp4KccN+/gAloVu/esjmU+yCIQLsYWPp+RB3y Z96A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=EjpmYPV3q8VOB2OvwFSUIPHR4a+6GT9d0UAt0Wy2Qm4=; b=YxJ9PmOxASnnEwxlG6s1y2nOOpHWlA+L+6cGcgGGEPPsdV0nqNFNo4AdRZyciycGq3 qMddJxrMJhNUNNbHhnOS9DtHc39A1PM3r1MWow3lZrAhoVXi7dnZ7KxEepEJvxYWm0vQ nbzu+S2+fQza9slD/r3CCA8CX7aNCBoQCOaUBvcyMarTdGfWjEwO4U+aS0h+LRo9c/Wm rSlnOBuh26MRDOXt7+O4zWtR1pBQJn3ed9/50Snesg5GySPGnBUhGWH9+ydLaB3rV0Dn +K7jcqUDsQ0y4CAJ9g0mEfpvij1rA4T58ZSw74kkwHK5gzEB0J9jQIXBIDMljEsFwpGa 57pQ== X-Gm-Message-State: AIVw110TNdjeSK6khp8UWAoeN0rH0pS0GFWDlYL800sJqbOm43yYsE0I /Lny8ND7hbLyLMNKq7233rQmlKhfEw== X-Received: by 10.223.182.168 with SMTP id j40mr430691wre.122.1500348818025; Mon, 17 Jul 2017 20:33:38 -0700 (PDT) MIME-Version: 1.0 References: <15d512c81ee-4b5c-b2ba@webprd-m49.mail.aol.com> In-Reply-To: From: Shameera Rathnayaka Date: Tue, 18 Jul 2017 03:33:26 +0000 Message-ID: Subject: Re: Helix + Mailing System To: Suresh Marru , Airavata Dev Content-Type: multipart/alternative; boundary="f4030438901456719005548f2e9e" archived-at: Tue, 18 Jul 2017 03:33:51 -0000 --f4030438901456719005548f2e9e Content-Type: text/plain; charset="UTF-8" Hi Apoorv, I haven't been touched with new airavata changes lately so I need little bit more details to give you the exact answer. For an example what is this Helis tool you are referring? If you can provide code references in GitHub for your code related questions would be great? Anyway, the following are answers to a couple of your questions. On Jul 17, 2017, at 11:30 AM, Apoorv Palkar wrote: > > Hey Dev, > > For the past 3-3.5 weeks, I've been investigating the use of Helix in > Airavata and been working on the email monitoring problem. I went through > the Curator/Zookeeper code to test out the internal workings of Helix. A > particular question I had was, what is the difference between external view > and current state? > > code reference? > I understood that helix uses the resource model to maintain both the ideal > state and current state. Why is it necessary to have an external view? In > addition to this, what is the purpose of a spectator node. In the > documentation, it states that a "spectator" reacts to changes in a > distributed system. Why have the particular node have limited abilities > when you can give it full access? These questions may be highly important > to consider when writing the Helix paper for submission. As for the > mailing/monitoring system, I have decided to move forward with the JavaMail > API + IMAP implementation. I used the gw155jobs@scigap.org (gmail) > address as a basis for running my test code. For this particular use case, > I didn't use the Gmail API because it had limited capabilities in terms of > function/library uses. I played around with the Gmail API, however, I was > unsuccessful in getting it to work in a clean and efficient manner. > > Which Gmail API you are talking aobut, last time I checked there is no google api to talk to Gmail, they are asking to use Java mail API which is complient with Gmail. > As such, I decided to use the JavaMail api provided via imported > libraries. IMAP was considered because it had greater capabilities than > POP3. POP3 was inefficient when fetching the emails. In terms of first > reading the emails, the first challenge was to set up the code correctly to > read from Gmail. Previously the issue was that the emails were being read > every time the read() function was called in the Inbox class. > > code reference? > This meant that every message would be pulled even if one email was > unread. This proved to be highly time costly as the scigap email address > has 10000+ emails at any given time. I set up boolean flags for email > addresses that were read and ones that were unread. As a result, all > messages don't have to be pulled; only the ones with a "false" flag need to > be read. > > https://github.com/apache/airavata/blob/master/modules/gfac/gfac-impl/src/main/java/org/apache/airavata/gfac/monitor/email/EmailBasedMonitor.java#L194 only read unseen before. > These messages were pulled and then put into a Message[] array. This array > was then compared using lambda expression as JavaMail retrieves the most > current message last. After these messages are put into the array and dealt > with, the messages are marked as "read" to avoid reading them again. > > This is already taken care of in the current code. https://github.com/apache/airavata/blob/master/modules/gfac/gfac-impl/src/main/java/org/apache/airavata/gfac/monitor/email/EmailBasedMonitor.java#L217 > Currently, I'm working on improving the implementations of all four email > parsers. It is highly important to make sure these parsers run effeciently > as many emails would be read. I didn't want to use regex as it is slightly > slower than string operations. > > have you done to performance matrix to come to this conclution? > For my demo code, I have currently used string operations to parse the > subject title/content. In reality, an array or StringBuilder class shoulder > be used when implemented professionally to improve on speed. > > It is performace over maintain overhead, that is why we use good communtiy support thrid party libraries when ever we can. Unless we have strong reason, I would avoid writting our own passer. In my experience this emails can be change over time (i.e: when bump versions of resource managers) so we have to maintain our code with that changes. ~Shameera. > Currently, I'm refactoring the PBS code to run a bit more optimally and > run test cases for the other two email types. Below is a link for the gmail > implementation + SLURM interpreter. Basically the idea is to have 4 classes > that handle each type and then proceed to parse the messages from the > Message[] array. The idea is to then take this COMMON data collected such > as job_id, name, status, time and then put it into a thrift data model > file. Using this thrift, then create a java thrift object to send over a > AMPQ message queue, RabbitMQ, to then potentially be used in a MySQL/SQL > database. As of now, the database part is not clear, but it would most > likely a registery that needs to be updated via use of Java JPA libary/SQL > queries. > > > https://github.com/chessman179/gmailtestinged > <<<<<<<<<<<<< code. > > > ** big shout out to Marcus -- > > > -- Shameera Rathnayaka --f4030438901456719005548f2e9e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Apoorv,

I haven't been touched w= ith new=C2=A0airavata=C2=A0changes lately so I need little bit more details= to give you the exact answer. For an example what is this Helis tool you a= re referring? If you can provide code references in GitHub for your=C2=A0co= de related questions would be great? Anyway, the following are answers to a= couple of your questions.=C2=A0

On Jul 17, 2017, at 11:30 AM, Ap= oorv Palkar <= apoorv_palkar@aol.com> wrote:

Hey Dev,

For the past 3-3.5 weeks, I've been investigating the use of Helix= in Airavata and been working on the email monitoring problem. I went throu= gh the Curator/Zookeeper code to test out the internal workings of Helix. A= particular question I had was, what is the difference between external vie= w and current state?
code reference? =C2=A0
I understood that helix use= s the resource model to maintain both the ideal state and current state. Wh= y is it necessary to have an external view? In addition to this, what is th= e purpose of a spectator node.=C2=A0= In the documentation, it states that a "spectator" reacts = to changes in a distributed system. Why have the particular node have limit= ed abilities when you can give it full access? These questions may be highl= y important to consider when writing the Helix paper for submission. As for= the mailing/monitoring system, I have decided to move forward with the Jav= aMail API + IMAP implementation. I used the gw155jobs@scigap.org (gmail) address as a ba= sis for running my test code. For this particular use case, I didn't us= e the Gmail API because it had limited capabilities in terms of function/li= brary uses. I played around with the Gmail API, however, I was unsuccessful= in getting it to work in a clean and efficient manner.
<= /blockquote>
Which Gmail API you a= re talking aobut, last time I checked there is no google api to talk to Gma= il, they are asking to use Java mail API which is complient with Gmail.=C2= =A0
As such, I decided to use the JavaMail api provided via imported l= ibraries. IMAP was considered because it had greater capabilities than POP3= . POP3 was inefficient when fetching the emails. In terms of first reading = the emails, the first challenge was to set up the code correctly to read fr= om Gmail. Previously the issue was that the emails were being read every ti= me the read() function was called in the Inbox class.
code reference? =C2=A0<= /div>
=
This meant that every message would be pulled even if one email was u= nread. This proved to be highly time costly as the scigap email address has= 10000+ emails at any given time. I set up boolean flags for email addresse= s that were read and ones that were unread. As a result, all messages don&#= 39;t have to be pulled; only the ones with a "false" flag need to= be read.
=C2=A0
These messages were pulled and then put into a Message[] ar= ray. This array was then compared using lambda expression as JavaMail retri= eves the most current message last. After these messages are put into the a= rray and dealt with, the messages are marked as "read" to avoid r= eading them again.
=
Currently, I'm working on improving the impleme= ntations of all four email parsers. It is highly important to make sure the= se parsers run effeciently as many emails would be read. I didn't want = to use regex as it is slightly slower than string operations.
=
have you done = to performance matrix to come to this conclution?=C2=A0
For my demo co= de, I have currently used string operations to parse the subject title/cont= ent. In reality, an array or StringBuilder class shoulder be used when impl= emented professionally to improve on speed.
=
It is performace over maintain ov= erhead, that is why we use good communtiy support thrid party libraries whe= n ever we can. Unless we have strong reason, I would avoid writting our own= passer. In my experience this emails can be change over time (i.e: when bu= mp versions of resource managers) so we have to maintain our code with that= changes. =C2=A0

~Shameera.
Currently, I'= ;m refactoring the PBS code to run a bit more optimally and run test cases = for the other two email types. Below is a link for the gmail implementation= + SLURM interpreter. Basically the idea is to have 4 classes that handle e= ach type and then proceed to parse the messages from the Message[] array. T= he idea is to then take this COMMON data collected such as job_id, name, st= atus, time and then put it into a thrift data model file. Using this thrift= , then create a java thrift object to send over a AMPQ message queue, Rabbi= tMQ, to then potentially be used in a MySQL/SQL database. As of now, the da= tabase part is not clear, but it would most likely a registery that needs t= o be updated via use of Java JPA libary/SQL queries.=C2=A0
=

https://github.com/chessman179/gmailtestinged =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<<<<<<<&l= t;<<<<< code.


** big shout out to Marcus --

--
Shameera Rathnayaka
--f4030438901456719005548f2e9e--