airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Marru <sma...@apache.org>
Subject Re: Helix + Mailing System
Date Mon, 17 Jul 2017 15:48:03 GMT
Hi Apoorv,

For all coding suggestions, I always suggest you fork the Airavata sandbox repository and
submit a pull request from your repo. That way you have a provenance of your contributions
to a major open source foundation. More over a PR is easier to review and provide feedback
instead of a repo.

This is great work through. I hope Shamaeera can review and provide feedback, he has been
the most experienced on this topic and its associated pragmatic issues. 

Suresh

> On Jul 17, 2017, at 11:30 AM, Apoorv Palkar <apoorv_palkar@aol.com> wrote:
> 
> Hey Dev,
> 
> For the past 3-3.5 weeks, I've been investigating the use of Helix in Airavata and been
working on the email monitoring problem. I went through the Curator/Zookeeper code to test
out the internal workings of Helix. A particular question I had was, what is the difference
between external view and current state? I understood that helix uses the resource model to
maintain both the ideal state and current state. Why is it necessary to have an external view?
In addition to this, what is the purpose of a spectator node. In the documentation, it states
that a "spectator" reacts to changes in a distributed system. Why have the particular node
have limited abilities when you can give it full access? These questions may be highly important
to consider when writing the Helix paper for submission. As for the mailing/monitoring system,
I have decided to move forward with the JavaMail API + IMAP implementation. I used the gw155jobs@scigap.org
(gmail) address as a basis for running my test code. For this particular use case, I didn't
use the Gmail API because it had limited capabilities in terms of function/library uses. I
played around with the Gmail API, however, I was unsuccessful in getting it to work in a clean
and efficient manner. As such, I decided to use the JavaMail api provided via imported libraries.
IMAP was considered because it had greater capabilities than POP3. POP3 was inefficient when
fetching the emails. In terms of first reading the emails, the first challenge was to set
up the code correctly to read from Gmail. Previously the issue was that the emails were being
read every time the read() function was called in the Inbox class. This meant that every message
would be pulled even if one email was unread. This proved to be highly time costly as the
scigap email address has 10000+ emails at any given time. I set up boolean flags for email
addresses that were read and ones that were unread. As a result, all messages don't have to
be pulled; only the ones with a "false" flag need to be read. These messages were pulled and
then put into a Message[] array. This array was then compared using lambda expression as JavaMail
retrieves the most current message last. After these messages are put into the array and dealt
with, the messages are marked as "read" to avoid reading them again. Currently, I'm working
on improving the implementations of all four email parsers. It is highly important to make
sure these parsers run effeciently as many emails would be read. I didn't want to use regex
as it is slightly slower than string operations. For my demo code, I have currently used string
operations to parse the subject title/content. In reality, an array or StringBuilder class
shoulder be used when implemented professionally to improve on speed. Currently, I'm refactoring
the PBS code to run a bit more optimally and run test cases for the other two email types.
Below is a link for the gmail implementation + SLURM interpreter. Basically the idea is to
have 4 classes that handle each type and then proceed to parse the messages from the Message[]
array. The idea is to then take this COMMON data collected such as job_id, name, status, time
and then put it into a thrift data model file. Using this thrift, then create a java thrift
object to send over a AMPQ message queue, RabbitMQ, to then potentially be used in a MySQL/SQL
database. As of now, the database part is not clear, but it would most likely a registery
that needs to be updated via use of Java JPA libary/SQL queries. 
> 
> https://github.com/chessman179/gmailtestinged                  <<<<<<<<<<<<<
code.
> 
> 
> ** big shout out to Marcus --


Mime
View raw message