singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ngin Yun Chuan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-399) Rafiki cannot test rebuilt image
Date Sun, 28 Oct 2018 03:34:00 GMT

    [ https://issues.apache.org/jira/browse/SINGA-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666273#comment-16666273
] 

Ngin Yun Chuan commented on SINGA-399:
--------------------------------------

Hi Zhu Lei,

Regarding your email with the following issue:
```
I want to add a new function in the predictor.py code. However, after I modify the code and
rebuild the Rafiki_predictor image from the predictor.Dockerfile and run the client_usage.py
code, I find Rafiki does not run my new code instead it seems that Rafiki is still running
the original predictor.py.
```
It's an issue we have encountered numerous times. Since Docker Hub also contains `rafikiai/predictor:0.0.4`
(https://hub.docker.com/u/rafikiai/), when we run `scripts/start.sh` with a locally built
`rafikiai/predictor:0.0.4`, it seems to use Docker Hub's version. Currently I have been resolving
it by incrementing the version in `.env.sh` to the next version i.e. 0.0.5 in your working
directory, as long as e.g. `rafikiai/predictor:0.0.5` has not been pushed to Docker Hub. In
the future, we should update the scripts to allow use of locally-built images even with such
a version conflict.


> Rafiki cannot test rebuilt image
> --------------------------------
>
>                 Key: SINGA-399
>                 URL: https://issues.apache.org/jira/browse/SINGA-399
>             Project: Singa
>          Issue Type: Bug
>            Reporter: Zhu Lei
>            Priority: Major
>         Attachments: rafiki-1.PNG, rafiki-2.PNG, rafiki-3.PNG, rafiki-4.PNG
>
>
> After downloading the newest rafiki code, at commit 7b3b04e15c62233e515c4d82051cd5dfb799215f,
with comments "Add more error handling to notify user of invalid train job; compact exceptions",
I ran "bash ./scripts/build_images.sh" to build the new admin, advisor, predictor and worker
images. I got the images shown in attached image 'rafiki-1.PNG'. Then I run "bash ./script/start.sh"
to build the containers as shown in the attached image 'rafiki-2.PNG'. Finally when I ran
the client-usage.py example. I got the error in attached image 'rafiki-3.PNG'.
> And I find very surprising that the images of admin, advisor, predictor and worker I
built just now, become some images built weeks ago, shown in attached image 'rafiki-4.PNG'.
Could you kindly provide me some explanations on why this happens? I really do not understand
why this happened.
> And finally, when I run "bash ./script/stop.sh" and leave the swarm and repeat my previous
procedure again, now there is no errors. The only thing difference between the two runs I
think is only the images are different. So the current code of rafiki does not support newly
build images, that is my speculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message