airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Marru <sma...@apache.org>
Subject Re: Airavata Registry Considerations
Date Fri, 22 May 2015 17:03:49 GMT
Hi Supun & Supun :)

This is good discussion. I think we need to balance both aspects here. I am not at all favoring
shoehorning into mongodb and again spend few months addressing the unknowns. On the other
hand, GSoC is the right time to explore alternatives. 

My expectation from this document was not so much of criticizing the current JPA based implementation.
Back then the focus was to adopt thrift for the data models (thanks Supun K for the recommendation).
Among other things, thrift helped us to keep the focus on airavata’s core capabilities and
quickly unify all the legacy interfaces. The currently JPA registry was developed from scratch
in a hurry to help with thrift adoption. I think it did well and exceeded initial expectations.


We now slowly circled through all the components and made tremendous progress. We reduced
the internal footprint significantly (rabbitmq in favor of WS Messenger, work queues in place
of custom co-ordination in workflow interpreter and so forth). I think its time to step back
and re-look at the metadata management needs. 

How about we not worry on the implementation costs and focus on what criteria we should look
into potential solutions and how to profile them? We should also include the full JPA based
implementation as one of the candidates. As both of you said, its important to identify the
profiling criteria. Chathuri has early work on this, in both the survey paper and performance
measurements, we probably should revisit them and build from there.  

Thanks,
Suresh

> On May 22, 2015, at 12:28 PM, Supun Nakandala <supun.nakandala@gmail.com> wrote:
> 
> Hi Supun,
> 
> On Fri, May 22, 2015 at 9:42 PM, Supun Kamburugamuve <supun06@gmail.com <mailto:supun06@gmail.com>>
wrote:
> Hi Supun,
> 
> In normal software developments, it is normal to have these kind of slowness. We cannot
foresee all the things when we develop. The solution is to improve the performance of important
operations rather than re-writing everything from the beginning. For example for this particular
select operations you can directly use SQL rather than going through JPA.
> 
> I'm pretty sure you'll encounter more problems, if you implement this in MongoDB than
in the current MySQL. If that happens, do you think abandoning that technology and going for
a new database will be a good solution? Now you have more experience with MySQL than MongoDB
as well.
> 
> Rather than going to abandon everything you have because of one problem, trying to fix
it may be better for you in the long run.
> 
> Thanks,
> Supun..
> 
> I completely agree with you. Writing things from scratch will need more development effort
and proper testing. And has the potential of incorporating new unknown issues. It is completely
possible to fix these issues in current registry and I have mentioned that in the doc also.
> 
> In addition to that I also checked several other alternatives and found MongoDB interesting.
I am not saying that we should completely rewrite registry using MongoDB. But I think it is
worth exploring it at a POC level.
>  
> On Fri, May 22, 2015 at 11:49 AM, Supun Nakandala <supun.nakandala@gmail.com <mailto:supun.nakandala@gmail.com>>
wrote:
> Hi Supun,
> 
> I haven't done done profiling of registry based operations. Here what I mean by slow
performance is mainly the slowness of the SELECT operations in PHP Reference Gateway. e.g
fetching Projects, fetching experiments. Even a simple query to fetch the 20 most recent experiments
is embarrassingly slow in PGA.
> 
> Even though I didn't do a proper profiling of operations I did a query log analysis for
a SELECT experiment query. This was a simple query to fetch 20 most recent experiments. I
found that JPA layer is generating enormous amount of queries for this task rather than one
single query (due to the select N+1 isssue). This issue is same for fetching a single experiment
by specifying the id.
> 
> I think it is ok to say that current registry has become bottleneck for most of the PGA
specific operations. But I don't have evidence to show how it has become a bottleneck for
the Orchestrator or GFac specific operations. For that as you have mentioned we need to profile
the operations. But I think the argument is still valid even for GFac and Orchestrator based
operations.
> 
> I have attached the query log for the above mentioned select operation here with. If
you observe the query log you can see that every associated entity is fetched separately using
complex join operations.
> 
> 
> 
> On Fri, May 22, 2015 at 8:05 PM, Supun Kamburugamuve <supun06@gmail.com <mailto:supun06@gmail.com>>
wrote:
> Hi Supun,
> 
> In your report it says Slow performance. Do you have any data about this slow performance?
For a typical request in what percent the registry slow down the processing compared to overall
time it takes to execute that request?
> 
> Do you have a use case where registry is the bottleneck?
> 
> Thanks,
> Supun..
> 
> On Fri, May 22, 2015 at 9:45 AM, Suresh Marru <smarru@apache.org <mailto:smarru@apache.org>>
wrote:
> Hi Supun,
> 
> This is very good analysis, you have nicely embraced the problem. Before we jump into
the solution, we may want to do small POC’s to validate your claims. 
> 
> Thank you for getting a headstart, this also cuts into GSoC goals of Douglas’s project.
So lets work on this collaboratively.
> 
> Hi Madhu,
> 
> Can you please provide guidance on this effort on how to academically approach the data
management challenges of Airavata. The students might appreciate insights on how to profile
and benchmark any possible solutions. 
> 
> Cheers,
> Suresh
> 
>> On May 22, 2015, at 9:18 AM, Supun Nakandala <supun.nakandala@gmail.com <mailto:supun.nakandala@gmail.com>>
wrote:
>> 
>> Hi Devs,
>> 
>> I have compiled a document based on the analysis I did on current registry architecture/technology
and possible modification and alternatives. You can find the document at https://docs.google.com/document/d/1XWAQLtdtCf9nTigAz6r5JINHR99bP0oeYaTgeEIVr4w/edit#
<https://docs.google.com/document/d/1XWAQLtdtCf9nTigAz6r5JINHR99bP0oeYaTgeEIVr4w/edit#>
>> 
>> Thanks
>> Supun
> 
> 
> 
> 
> -- 
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org <http://www.apache.org/>
> E-mail: supun06@gmail.com <mailto:supun06@gmail.com>;  Mobile: +1 812 369 6762
<tel:%2B1%20812%20369%206762>
> Blog: http://supunk.blogspot.com <http://supunk.blogspot.com/>
> 
> 
> 
> 
> -- 
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
> 
> 
> 
> -- 
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org <http://www.apache.org/>
> E-mail: supun06@gmail.com <mailto:supun06@gmail.com>;  Mobile: +1 812 369 6762
<tel:%2B1%20812%20369%206762>
> Blog: http://supunk.blogspot.com <http://supunk.blogspot.com/>
> 
> 
> 
> 
> -- 
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa


Mime
View raw message