Putting my demo on Mini KF .5

Derek Ferguson
May 25, 2019
2 min read

A couple of months ago, I wrote some blog posts about the creation of a simple Kubeflow Pipelines demo, which is available on Github at https://github.com/JavaDerek/FashionMnistKF. With the recent release of Kubeflow .5 and a version of MiniKF that supports Kubeflow .5, I want to see what will be required in order to move my demos over to this new MiniKF version.

Before I start, let me recommend this blog post as the best guide for getting started with MiniKF's new Kubeflow .5 support.

So, following my own Readme file, I started by spinning up a custom instance of Minio by logging into the SSH interface and installing it. That seemed to run fine, so I continued on to try to set up TensorFlow Serving. First thing I notice is that I never put in a warning about the fact that the console is going to start spinning error messages until you run the pipeline once to completion. So, I add that.

The next thing I notice is that I'm still using the old port for Kubeflow here -- need to change that to 8080, so I do.

Next is a slight verbiage change from "Start an Experiment" to "Create Experiment" -- easy.

The page auto-refreshes nowadays, so I removed the reference to that.

First really substantial discovery at this point - the Download step fails because it is trying to reach port 9000 on an IP that doesn't work. I scroll back in my command history and discover that a different IP has been assigned to this VM, so Mini is at a different place. I have to change the code to allow for this but, more importantly, I have to tell users to do this when they set up this example! So, I mod the README yet again.

Strangely... re-running after this change and build has the same issue. So, I manually pull the Download image from the command prompt and -- that step works fine. I think there must be something set that prevents downloading of images, even if newer ones are available.

For this reason, I'm surprised to see "preprocess" run without needing explicit download but, I guess that is because the step didn't run even once correctly under this version. OK - fine.

Everything runs fine until the "evaluate" step, which bombs with a 500 error and very little error message to assist. I note that the console is still scrolling, though, so I can deduce from that that the updated model has not been pulled.

After a little digging, I realize it is our old friend the lack of a new image again. I guess that is because I started tfsbase before updating the IP, also. So, I manually do the image pull from within the VM and restart tfsbase. Same error at first, but I quickly hit "10.10.10.10:5000" in a browser (as the evaluate step will) and ... it works!

So, now I rerun the pipeline - passing in "half" this time, to skip the download and pre-process steps that already worked properly. And... success!

I publish my updated Git README (so the above items I mentioned changing are already gone) and I'm ready for the presentation in 3 weeks in San Jose! :-)

kNative from Scratch - a failed attempt

Recording multi-track TD-50 Drums with Pro Tools

Notes on GPT-2 (345M) use with custom text

Putting my demo on Mini KF .5

Comentarios