reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (REEF-1797) Driver not shut down when running 1000 nodes on cluster for IMRU Example
Date Fri, 12 May 2017 02:30:04 GMT

     [ https://issues.apache.org/jira/browse/REEF-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julia updated REEF-1797:
------------------------
    Description: 
When I run IMRU Example with 1000 nodes on cluster with the latest master bits, I noticed
the driver is not able to be shut down. Looking into detail logs, not all the CompletedEvaluator
events are received (missing 1 or 2 in different tests) even if all the CompletedTask events
are received and our code has called Dispose() for all the active contexts. 

The test with 1000 nodes on the REEF last Dec bits can be shut down successfully. 

  was:
When I run IMRU Example with 1000 nodes on cluster with the latest master bits, I noticed
the driver is not able to be shut down. Looking into detail logs, not all the CompletedEvaluator
events are received (missing 1 or 2 in different tests) even if all the CompletedTask events
are received and our code has called Dispose() for all the active contexts. 

The test with 100 nodes on the REEF last Dec bits can be shut down successfully. 


> Driver not shut down when running 1000 nodes on cluster for IMRU Example
> ------------------------------------------------------------------------
>
>                 Key: REEF-1797
>                 URL: https://issues.apache.org/jira/browse/REEF-1797
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>
> When I run IMRU Example with 1000 nodes on cluster with the latest master bits, I noticed
the driver is not able to be shut down. Looking into detail logs, not all the CompletedEvaluator
events are received (missing 1 or 2 in different tests) even if all the CompletedTask events
are received and our code has called Dispose() for all the active contexts. 
> The test with 1000 nodes on the REEF last Dec bits can be shut down successfully. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message