airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject Re: Profiling the current Airavata registry
Date Tue, 12 Aug 2014 13:25:13 GMT
On Tue, Aug 12, 2014 at 6:42 PM, Marlon Pierce <marpierc@iu.edu> wrote:

> A single user may have O(100) to O(1000) experiments, so 10K is too small
> as an upper bound on the registry for many users.

+1

I agree with Marlon, we have the most basic search method, but the reality
is we need search criteria like Marlon suggest, and I am sure content based
search will be pretty slow with large number of experiments. So we have to
use a search platform like Solr to improve the performance.

I think first you can do the performance test without content based search
then we can implement that feature, then do performance analysis, if its
too bad(more likely) then we can integrate a search platform to improve the
performance.

Lahiru

> We should really test until things break.  A plot implying infinite
> scaling (by extrapolation) is not useful.  A plot showing OK scaling up to
> a certain point before things decay is useful.
>
> I suggest you post more carefully a set of experiments, starting with
> Lahiru's suggestion. How many users? How many experiments per user?  What
> kind of searches?  Probably the most common will be "get all my experiments
> that match this string", "get all experiments that have state FAILED", and
> "get all my experiments from the last 30 days".  But the API may not have
> the latter two yet.
>
> So to start, you should specify a prototype user.  For example, each user
> will have 1000 experiments: 100 AMBER jobs, 100 LAMMPS jobs, etc. Each user
> will have a unique but human readable name (user1, user2, ...). Each
> experiment will have a unique human readable description (AMBER job 1 for
> user 1, Amber job 2 for user 1, ...), etc that is suitable for searching.
>
> Post these details first, and then you can create via scripts experiment
> registries of any size. Each experiment is different but suitable for
> pattern searching.
>
> This is 10 minutes worth of thought while waiting for my tea to brew, so
> hopefully this is the right start, but I encourage you to not take this as
> fixed instructions.
>
> Marlon
>
>
> On 8/12/14, 8:54 AM, Lahiru Gunathilake wrote:
>
>> Hi Sachith,
>>
>> How did you test this ? What database did you use ?
>>
>> I think 1000 experiments is a very low number. I think most important part
>> is when there are large number of experiments, how expensive is the search
>> and how expensive is a single experiment retrieval.
>>
>> If we support to get defined number of experiments in the API (I think
>> this
>> is the practical scenario, among 10k experiments get 100) we have to test
>> the performance of that too.
>>
>> Regards
>> Lahiru
>>
>>
>> On Tue, Aug 12, 2014 at 4:59 PM, Sachith Withana <swsachith@gmail.com>
>> wrote:
>>
>>  Hi all,
>>>
>>> I'm testing the registry with 10,1000,10,000 Experiments and I've tested
>>> the database performance executing the getAllExperiments method.
>>> I'll post the complete analysis.
>>>
>>> What are the other methods that I should test using?
>>>
>>> getExperiment(experiment_id)
>>> searchExperiment
>>>
>>> Any pointers?
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 6:07 PM, Marlon Pierce <marpierc@iu.edu> wrote:
>>>
>>>  Thanks, Sachith. Did you look at scaling also?  That is, will the
>>>> operations below still be the slowest if the DB is 10x, 100x, 1000x
>>>> bigger?
>>>>
>>>> Marlon
>>>>
>>>>
>>>> On 7/23/14, 8:22 AM, Sachith Withana wrote:
>>>>
>>>>  Hi all,
>>>>>
>>>>> I'm profiling the current registry in few different aspects.
>>>>>
>>>>> I looked into the database operations and I've listed the operations
>>>>> that
>>>>> take the most amount of time.
>>>>>
>>>>> 1. Getting the Status of an Experiment (takes around 10% of the overall
>>>>> time spent)
>>>>>       Has to go through the hierarchy of the datamodel to get to the
>>>>> actual
>>>>> experiment status ( node,     tasks ...etc)
>>>>>
>>>>> 2. Dealing with the Application Inputs
>>>>>       Strangely it takes a long time for the queries regarding the
>>>>> ApplicationInputs to complete.
>>>>>       This is a part of the new Application Catalog
>>>>>
>>>>> 3. Getting all the Experiments ( using the * wild card)
>>>>>       This takes the maximum amount of time when queried at first. But
>>>>> thanks
>>>>> to the OpenJPA        caching, it flattens out as we keep querying.
>>>>>
>>>>> To reduce the first issue, I would suggest to have a different table
>>>>> for
>>>>> Experiment Summaries,
>>>>> where the status ( both the state and the state update time) would be
>>>>> the
>>>>> only varying entity, and use that to improve the query time for
>>>>> Experiment
>>>>> summaries.
>>>>>
>>>>> It would also help improve the performance for getting all the
>>>>> Experiments
>>>>> ( experiment summaries)
>>>>>
>>>>> WDYT?
>>>>>
>>>>> ToDos :  Look into memory consumption ( in terms of memory leakage
>>>>> ...etc)
>>>>>
>>>>>
>>>>> Any more suggestions?
>>>>>
>>>>>
>>>>
>>> --
>>> Thanks,
>>> Sachith Withana
>>>
>>>
>>>
>>
>


-- 
System Analyst Programmer
PTI Lab
Indiana University

Mime
View raw message