Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8BC3A200CC9 for ; Mon, 17 Jul 2017 17:48:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8A290164D37; Mon, 17 Jul 2017 15:48:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A91C7164CEC for ; Mon, 17 Jul 2017 17:48:08 +0200 (CEST) Received: (qmail 42220 invoked by uid 500); 17 Jul 2017 15:48:07 -0000 Mailing-List: contact dev-help@airavata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airavata.apache.org Delivered-To: mailing list dev@airavata.apache.org Received: (qmail 42202 invoked by uid 99); 17 Jul 2017 15:48:07 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jul 2017 15:48:07 +0000 Received: from 149-160-163-121.dhcp-bl.indiana.edu (149-160-163-121.dhcp-bl.indiana.edu [149.160.163.121]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 9C77C1A0029; Mon, 17 Jul 2017 15:48:05 +0000 (UTC) From: Suresh Marru Content-Type: multipart/alternative; boundary="Apple-Mail=_63E1EA6A-5709-4CFC-A71D-72D9A4378A96" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Helix + Mailing System Date: Mon, 17 Jul 2017 11:48:03 -0400 References: <15d512c81ee-4b5c-b2ba@webprd-m49.mail.aol.com> To: Airavata Dev , Shameera Rathnayaka In-Reply-To: <15d512c81ee-4b5c-b2ba@webprd-m49.mail.aol.com> Message-Id: X-Mailer: Apple Mail (2.3273) archived-at: Mon, 17 Jul 2017 15:48:09 -0000 --Apple-Mail=_63E1EA6A-5709-4CFC-A71D-72D9A4378A96 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi Apoorv, For all coding suggestions, I always suggest you fork the Airavata = sandbox repository and submit a pull request from your repo. That way = you have a provenance of your contributions to a major open source = foundation. More over a PR is easier to review and provide feedback = instead of a repo. This is great work through. I hope Shamaeera can review and provide = feedback, he has been the most experienced on this topic and its = associated pragmatic issues.=20 Suresh > On Jul 17, 2017, at 11:30 AM, Apoorv Palkar = wrote: >=20 > Hey Dev, >=20 > For the past 3-3.5 weeks, I've been investigating the use of Helix in = Airavata and been working on the email monitoring problem. I went = through the Curator/Zookeeper code to test out the internal workings of = Helix. A particular question I had was, what is the difference between = external view and current state? I understood that helix uses the = resource model to maintain both the ideal state and current state. Why = is it necessary to have an external view? In addition to this, what is = the purpose of a spectator node. In the documentation, it states that a = "spectator" reacts to changes in a distributed system. Why have the = particular node have limited abilities when you can give it full access? = These questions may be highly important to consider when writing the = Helix paper for submission. As for the mailing/monitoring system, I have = decided to move forward with the JavaMail API + IMAP implementation. I = used the gw155jobs@scigap.org (gmail) address as a basis for running my = test code. For this particular use case, I didn't use the Gmail API = because it had limited capabilities in terms of function/library uses. I = played around with the Gmail API, however, I was unsuccessful in getting = it to work in a clean and efficient manner. As such, I decided to use = the JavaMail api provided via imported libraries. IMAP was considered = because it had greater capabilities than POP3. POP3 was inefficient when = fetching the emails. In terms of first reading the emails, the first = challenge was to set up the code correctly to read from Gmail. = Previously the issue was that the emails were being read every time the = read() function was called in the Inbox class. This meant that every = message would be pulled even if one email was unread. This proved to be = highly time costly as the scigap email address has 10000+ emails at any = given time. I set up boolean flags for email addresses that were read = and ones that were unread. As a result, all messages don't have to be = pulled; only the ones with a "false" flag need to be read. These = messages were pulled and then put into a Message[] array. This array was = then compared using lambda expression as JavaMail retrieves the most = current message last. After these messages are put into the array and = dealt with, the messages are marked as "read" to avoid reading them = again. Currently, I'm working on improving the implementations of all = four email parsers. It is highly important to make sure these parsers = run effeciently as many emails would be read. I didn't want to use regex = as it is slightly slower than string operations. For my demo code, I = have currently used string operations to parse the subject = title/content. In reality, an array or StringBuilder class shoulder be = used when implemented professionally to improve on speed. Currently, I'm = refactoring the PBS code to run a bit more optimally and run test cases = for the other two email types. Below is a link for the gmail = implementation + SLURM interpreter. Basically the idea is to have 4 = classes that handle each type and then proceed to parse the messages = from the Message[] array. The idea is to then take this COMMON data = collected such as job_id, name, status, time and then put it into a = thrift data model file. Using this thrift, then create a java thrift = object to send over a AMPQ message queue, RabbitMQ, to then potentially = be used in a MySQL/SQL database. As of now, the database part is not = clear, but it would most likely a registery that needs to be updated via = use of Java JPA libary/SQL queries.=20 >=20 > https://github.com/chessman179/gmailtestinged = <<<<<<<<<<<<< code. >=20 >=20 > ** big shout out to Marcus -- --Apple-Mail=_63E1EA6A-5709-4CFC-A71D-72D9A4378A96 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi Apoorv,

For all coding suggestions, I always suggest you fork the = Airavata sandbox repository and submit a pull request from your repo. = That way you have a provenance of your contributions to a major open = source foundation. More over a PR is easier to review and provide = feedback instead of a repo.

This is great work through. I hope Shamaeera can review and = provide feedback, he has been the most experienced on this topic and its = associated pragmatic issues. 

Suresh

On Jul 17, 2017, at 11:30 AM, Apoorv Palkar <apoorv_palkar@aol.com> wrote:

Hey Dev,

For the past 3-3.5 weeks, I've been investigating the = use of Helix in Airavata and been working on the email monitoring = problem. I went through the Curator/Zookeeper code to test out the = internal workings of Helix. A particular question I had was, what is the = difference between external view and current state? I understood that = helix uses the resource model to maintain both the ideal state and = current state. Why is it necessary to have an external view? In addition = to this, what is the purpose of a spectator node. In the documentation, = it states that a "spectator" reacts to changes in a distributed system. = Why have the particular node have limited abilities when you can give it = full access? These questions may be highly important to consider when = writing the Helix paper for submission. As for the mailing/monitoring = system, I have decided to move forward with the JavaMail API + IMAP = implementation. I used the gw155jobs@scigap.org (gmail) address as a basis for = running my test code. For this particular use case, I didn't use the = Gmail API because it had limited capabilities in terms of = function/library uses. I played around with the Gmail API, however, I = was unsuccessful in getting it to work in a clean and efficient manner. = As such, I decided to use the JavaMail api provided via imported = libraries. IMAP was considered because it had greater capabilities than = POP3. POP3 was inefficient when fetching the emails. In terms of first = reading the emails, the first challenge was to set up the code correctly = to read from Gmail. Previously the issue was that the emails were being = read every time the read() function was called in the Inbox class. This = meant that every message would be pulled even if one email was unread. = This proved to be highly time costly as the scigap email address has = 10000+ emails at any given time. I set up boolean flags for email = addresses that were read and ones that were unread. As a result, all = messages don't have to be pulled; only the ones with a "false" flag need = to be read. These messages were pulled and then put into a Message[] = array. This array was then compared using lambda expression as JavaMail = retrieves the most current message last. After these messages are put = into the array and dealt with, the messages are marked as "read" to = avoid reading them again. Currently, I'm working on improving the = implementations of all four email parsers. It is highly important to = make sure these parsers run effeciently as many emails would be read. I = didn't want to use regex as it is slightly slower than string = operations. For my demo code, I have currently used string operations to = parse the subject title/content. In reality, an array or StringBuilder = class shoulder be used when implemented professionally to improve on = speed. Currently, I'm refactoring the PBS code to run a bit more = optimally and run test cases for the other two email types. Below is a = link for the gmail implementation + SLURM interpreter. Basically the = idea is to have 4 classes that handle each type and then proceed to = parse the messages from the Message[] array. The idea is to then take = this COMMON data collected such as job_id, name, status, time and then = put it into a thrift data model file. Using this thrift, then create a = java thrift object to send over a AMPQ message queue, RabbitMQ, to then = potentially be used in a MySQL/SQL database. As of now, the database = part is not clear, but it would most likely a registery that needs to be = updated via use of Java JPA libary/SQL queries. 

https://github.com/chessman179/gmailtestinged   =               =  <<<<<<<<<<<<< code.


** big shout out to Marcus --

= --Apple-Mail=_63E1EA6A-5709-4CFC-A71D-72D9A4378A96--