airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christie, Marcus Aaron" <machr...@iu.edu>
Subject Re: Airavata Output Processing Doubt
Date Tue, 17 Jan 2017 13:52:54 GMT
Ajinkya,

I think an example of what you are trying to accomplish would be helpful for me. I’m not
quite understanding what problem you are trying to solve nor the proposed solution.  Are you
trying to turn the STDOUT of an application into an INTEGER value somehow?

Thanks,

Marcus

On Jan 16, 2017, at 12:58 PM, Ajinkya Dhamnaskar <adhamnas@umail.iu.edu<mailto:adhamnas@umail.iu.edu>>
wrote:

Shameera,

Now, I got your concern. This is an inherent problem. I was actually referring to STDOUT and
STDERR which job generates after execution.
I was wondering if we can write this output as key value pair in some text file under specific
directory for particular JOB.

So, basically every experiment will have output staging task, but in case of output types
(INTEGER, STRING etc) the file which will be staged would have output in key-value pairs.
For example, lets consider the case that you mentioned with two string outputs, we can probably
generate file on remote server as str1-value2 \n str2-value2 and stage the same.
But, I understand this needs to be implemented in job scripts and requires much more than
this.

How would you think through this idea? Do you think this has some potential?


On Mon, Jan 16, 2017 at 6:55 AM, Shameera Rathnayaka <shameerainfo@gmail.com<mailto:shameerainfo@gmail.com>>
wrote:
Hi Ajinkya,

As and when job gets completed we save output for the same. I was wondering, if we can get
that information knowing process_id and task_id.
Job table has process_id and task_id, possibly we can fetch output stored in Job table.
If you check the JobModel we don't associate Job output with it. So no job outputs in JobTable.
We have to fetch output and save it in registry(I think that is what you refering to log output
in our previous replies) problem is how to fetch and save if the job is completed and output
is not a file. If we can find a solution to this we can extend this to support multiple outputs
as well.


Also, in case of multiple outputs, each task would know its output name and possibly we can
use that name alongside process_id for fetching correct value. Workflow would know which output
to use as an input for other application. I hope, I understood your concern correctly.

On Sun, Jan 15, 2017 at 11:59 PM, Shameera Rathnayaka <shameerainfo@gmail.com<mailto:shameerainfo@gmail.com>>
wrote:
This approach sounds promising to me, if we have more than one non-file outputs then we will
have more than OutputLogginTasks. But how exactly this OutputLogginTask(please think of a
new name for this task) read the data, because by the time OutputLogginTask is invoked, the
actual job in target computer resource is completed and output is where? if we have more than
one OutputLogginTasks how it reads the value associated with it. eg: if my job output two
Strings "str1", "str2" and I am using "str2" for downstream application in my workflow how
we can guarantee downstream application always get the correct value?

On Sun, Jan 15, 2017 at 1:02 PM Ajinkya Dhamnaskar <adhamnas@umail.iu.edu<mailto:adhamnas@umail.iu.edu>>
wrote:
Hi Shameera,

If you check org.apache.airavata.orchestrator.cpi.impl.SimpleOrchestratorImpl#createAndSaveOutputDataStagingTasks()
method, we entertain output staging task only when output data type is STDOUT, STDERR and
URI. I am suggesting, in default case we will create different task which points to OutputLogginTask.

OutputLogginTask is nothing but yet another implementation of task, similar to SCPDataStageTask
where we stage files and log output as well. But, in OutputLogginTask we need not to stage
any data, we would just log data as it come.

I am assuming TaskId and ProcessId is sufficient to fetch output. (please correct me if you
don't think so)

Thanks Shameera, this discussion is helping me a lot.

On Sun, Jan 15, 2017 at 9:38 AM, Shameera Rathnayaka <shameerainfo@gmail.com<mailto:shameerainfo@gmail.com>>
wrote:
Hi Ajinkya,

It is not clear to me how this "OutputLogginTask" knows about job output data. It would be
helpful if you can explain your suggested approach bit more.

Best,
Shameera.

On Sat, Jan 14, 2017 at 1:07 PM Ajinkya Dhamnaskar <adhamnas@umail.iu.edu<mailto:adhamnas@umail.iu.edu>>
wrote:
Hi Shameera,

One possible solution would be to introduce OutputLoggingTask. We can create output task irrespective
of output data type and if there isn't any file to stage we can call OutputLoggingTask.
Its sole purpose is to log data, that way we can justify each output type.

Please suggest, in case you think of any better solution.

Thanks in anticipation.


On Sat, Jan 14, 2017 at 9:46 PM, Shameera Rathnayaka <shameerainfo@gmail.com<mailto:shameerainfo@gmail.com>>
wrote:
Hi Ajinkya,

Yes, that is the case. how would you plan to solve it?

Regards,
Shameera.

On Fri, Jan 13, 2017 at 6:37 AM, Ajinkya Dhamnaskar <adhamnas@umail.iu.edu<mailto:adhamnas@umail.iu.edu>>
wrote:
Amila,

Thanks Amila for explaining. It really explains how things are mapped. I could see output
against JOB but could not figure out from where exactly we are logging output for a process.

Shameera,

Yeah, that's true. So basically, if application does not have output staging task, it would
not log output for respective process.
Which means if output data type is not URI, we are not logging output against process.(Please
correct me if I am wrong).

Probably, here we have an opportunity to improve.

Thanks in anticipation

On Fri, Jan 13, 2017 at 8:58 AM, Shameera Rathnayaka <shameerainfo@gmail.com<mailto:shameerainfo@gmail.com>>
wrote:
Hi Ajinkya,

If you check here org.apache.airavata.gfac.impl.task.SCPDataStageTask#outputDataStaging you
will see that we are saving process outputs to database(through registry). You probably testing
with local job submission with org.apache.airavata.gfac.impl.task.DataStageTask as data staging
task implementation. There we don't save process outputs. First thing is to fix this and save
the process outputs to the database.

If you know the JobId then you can retrieve processId from Job model. Using processId you
can get all process outputs. see PROCES_OUTPUT case in org.apache.airavata.registry.core.experiment.catalog.impl.ExperimentCatalogImpl#get(..,..)
method.

Hope this will help you to move forward.

Best,
Shameera.

On Thu, Jan 12, 2017 at 3:48 PM Amila Jayasekara <thejaka.amila@gmail.com<mailto:thejaka.amila@gmail.com>>
wrote:
Hi Ajinkya,

I am not familiar with the context of your question but let me try to answer.

If you are referring to an application deployed in a supercomputer, then the application should
have a job id. In the supercomputer, each application runs as a separate batch job and each
job is distinguished using the job id (similar to process id in a PC). Usually, the job scheduler
returns this job id and Airavata should be aware about that job id. Then, you should be able
to use this job id to identify the output, provided job script specify instructions to generate
output.

I did not understand what you referred as "process model" and "job model". I assume these
are database tables.

Thanks
-Amila



On Wed, Jan 11, 2017 at 1:17 PM, Ajinkya Dhamnaskar <adhamnas@umail.iu.edu<mailto:adhamnas@umail.iu.edu>>
wrote:
Hello Dev,

I am trying to fetch application output (type:INTEGER) after experiment completion. As per
my understanding each application runs as a process and that process should have final output.

So, ideally we should be able to get final output from process id itself (correct me if I
am wrong).
In my case, I am not seeing final output in database. Basically, we are not updating process
model after job completion, we update job model though.

Am I missing anything here?

Any help is appreciated.

--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>

--
Shameera Rathnayaka



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Best Regards,
Shameera Rathnayaka.

email: shameera AT apache.org<http://apache.org/> , shameerainfo AT gmail.com<http://gmail.com/>
Blogs : https://shameerarathnayaka.wordpress.com<https://shameerarathnayaka.wordpress.com/>
, http://shameerarathnayaka.blogspot.com/



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>
--
Shameera Rathnayaka



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>
--
Shameera Rathnayaka



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>
--
Shameera Rathnayaka



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416

Mime
View raw message