Return-Path: X-Original-To: apmail-airavata-dev-archive@www.apache.org Delivered-To: apmail-airavata-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 81A8C11130 for ; Tue, 12 Aug 2014 12:55:00 +0000 (UTC) Received: (qmail 69379 invoked by uid 500); 12 Aug 2014 12:55:00 -0000 Delivered-To: apmail-airavata-dev-archive@airavata.apache.org Received: (qmail 69327 invoked by uid 500); 12 Aug 2014 12:55:00 -0000 Mailing-List: contact dev-help@airavata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airavata.apache.org Delivered-To: mailing list dev@airavata.apache.org Received: (qmail 69296 invoked by uid 99); 12 Aug 2014 12:54:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2014 12:54:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of glahiru@gmail.com designates 209.85.212.175 as permitted sender) Received: from [209.85.212.175] (HELO mail-wi0-f175.google.com) (209.85.212.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2014 12:54:55 +0000 Received: by mail-wi0-f175.google.com with SMTP id ho1so5798449wib.14 for ; Tue, 12 Aug 2014 05:54:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HTCAe/K/jIKqH05hgxJRE1TCKcwm7TYPjHTrAgKKBi4=; b=DHX2TmwcAXgMbNCTpEQyjxGStSEZHDasmzhqbJE9Szgvp6M16kJjz+swgmyCMpym41 wpwsZNI8ir2v5hJgslRPnW29ZQJ1cOXyn3RL2/E3ESJEAbSohkbDUQLQKI686dWx+50w R0lwNyIc+QYNNo2pkeAv3iM3ilrCAgriZGjBMTKJEu5wpJkgLzFRYZ6a4x3kut0+h/Ab GYfUXspydV3lz+wzNxdiGPCBBwqoDVGE4FXgpZZZSkXfpNCF2KHE/H/BgPC9RvK0FN36 GWTRnlaKu118Di2HxynU12wuwLKCOs9UMl8HB1wnnUODnGczHoRRP+qP2Aj+osKUA+Rf abIw== MIME-Version: 1.0 X-Received: by 10.194.62.67 with SMTP id w3mr4962877wjr.32.1407848074368; Tue, 12 Aug 2014 05:54:34 -0700 (PDT) Received: by 10.216.199.66 with HTTP; Tue, 12 Aug 2014 05:54:34 -0700 (PDT) In-Reply-To: References: <53CFAC8D.8070000@iu.edu> Date: Tue, 12 Aug 2014 18:24:34 +0530 Message-ID: Subject: Re: Profiling the current Airavata registry From: Lahiru Gunathilake To: dev Content-Type: multipart/alternative; boundary=047d7b872e9a5edd8505006e2c21 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b872e9a5edd8505006e2c21 Content-Type: text/plain; charset=UTF-8 Hi Sachith, How did you test this ? What database did you use ? I think 1000 experiments is a very low number. I think most important part is when there are large number of experiments, how expensive is the search and how expensive is a single experiment retrieval. If we support to get defined number of experiments in the API (I think this is the practical scenario, among 10k experiments get 100) we have to test the performance of that too. Regards Lahiru On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana wrote: > Hi all, > > I'm testing the registry with 10,1000,10,000 Experiments and I've tested > the database performance executing the getAllExperiments method. > I'll post the complete analysis. > > What are the other methods that I should test using? > > getExperiment(experiment_id) > searchExperiment > > Any pointers? > > > > On Wed, Jul 23, 2014 at 6:07 PM, Marlon Pierce wrote: > >> Thanks, Sachith. Did you look at scaling also? That is, will the >> operations below still be the slowest if the DB is 10x, 100x, 1000x bigger? >> >> Marlon >> >> >> On 7/23/14, 8:22 AM, Sachith Withana wrote: >> >>> Hi all, >>> >>> I'm profiling the current registry in few different aspects. >>> >>> I looked into the database operations and I've listed the operations that >>> take the most amount of time. >>> >>> 1. Getting the Status of an Experiment (takes around 10% of the overall >>> time spent) >>> Has to go through the hierarchy of the datamodel to get to the >>> actual >>> experiment status ( node, tasks ...etc) >>> >>> 2. Dealing with the Application Inputs >>> Strangely it takes a long time for the queries regarding the >>> ApplicationInputs to complete. >>> This is a part of the new Application Catalog >>> >>> 3. Getting all the Experiments ( using the * wild card) >>> This takes the maximum amount of time when queried at first. But >>> thanks >>> to the OpenJPA caching, it flattens out as we keep querying. >>> >>> To reduce the first issue, I would suggest to have a different table for >>> Experiment Summaries, >>> where the status ( both the state and the state update time) would be the >>> only varying entity, and use that to improve the query time for >>> Experiment >>> summaries. >>> >>> It would also help improve the performance for getting all the >>> Experiments >>> ( experiment summaries) >>> >>> WDYT? >>> >>> ToDos : Look into memory consumption ( in terms of memory leakage >>> ...etc) >>> >>> >>> Any more suggestions? >>> >> >> > > > -- > Thanks, > Sachith Withana > > -- System Analyst Programmer PTI Lab Indiana University --047d7b872e9a5edd8505006e2c21 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Sachith,

How did you test this ? Wha= t database did you use ?

I think 1000 experiments = is a very low number. I think most important part is when there are large n= umber of experiments, how expensive is the search and how expensive is a si= ngle experiment retrieval.

If we support to get defined number of experiments in t= he API (I think this is the practical scenario, among 10k experiments get 1= 00) we have to test the performance of that too.

Regards
Lahiru


On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana <swsachith@gmail.com> wrote:
Hi all,

= I'm testing the registry with 10,1000,10,000 Experiments and I've t= ested the database performance executing the getAllExperiments method.
I'll post the complete analysis.

What are the other methods that I should test using?

getExperiment(experiment_id)
searchExperim= ent

Any pointers?



On Wed, Jul 23, 2014 at 6:07 PM, Marlon = Pierce <marpierc@iu.edu> wrote:
Thanks, Sachith. Did you look at scaling also? =C2=A0That is, will the oper= ations below still be the slowest if the DB is 10x, 100x, 1000x bigger?

Marlon


On 7/23/14, 8:22 AM, Sachith Withana wrote:
Hi all,

I'm profiling the current registry in few different aspects.

I looked into the database operations and I've listed the operations th= at
take the most amount of time.

1. Getting the Status of an Experiment (takes around 10% of the overall
time spent)
=C2=A0 =C2=A0 =C2=A0Has to go through the hierarchy of the datamodel to get= to the actual
experiment status ( node, =C2=A0 =C2=A0 tasks ...etc)

2. Dealing with the Application Inputs
=C2=A0 =C2=A0 =C2=A0Strangely it takes a long time for the queries regardin= g the
ApplicationInputs to complete.
=C2=A0 =C2=A0 =C2=A0This is a part of the new Application Catalog

3. Getting all the Experiments ( using the * wild card)
=C2=A0 =C2=A0 =C2=A0This takes the maximum amount of time when queried at f= irst. But thanks
to the OpenJPA =C2=A0 =C2=A0 =C2=A0 =C2=A0caching, it flattens out as we ke= ep querying.

To reduce the first issue, I would suggest to have a different table for Experiment Summaries,
where the status ( both the state and the state update time) would be the only varying entity, and use that to improve the query time for Experiment<= br> summaries.

It would also help improve the performance for getting all the Experiments<= br> ( experiment summaries)

WDYT?

ToDos : =C2=A0Look into memory consumption ( in terms of memory leakage ...= etc)


Any more suggestions?




<= /div>--
Thanks,
Sachith Withana




--
System Analy= st Programmer
PTI Lab
Indiana University
--047d7b872e9a5edd8505006e2c21--