top of page

First attempt at KF .6.1

  • Writer: Derek Ferguson
    Derek Ferguson
  • Jul 28, 2019
  • 4 min read

Kubeflow .6.1 came out 2 days ago as I start this blog. Let's see how an install attempt goes.

So, as always - I will start by reinstalling the OS images and Kubernetes cluster from scratch. This time, though, I'm going to be a little smarter than last time. Mac Mini from 2011 will be the Kubernetes master and powerful gaming PC from 2018 with the GPU will be my worker node. :-)

Fresh Ubuntu 18.04 is already installed. So I'll just follow this article to put down the k8s master. Right away, I discover that "sudo" is inadequate for running the very first command, so I use "sudo passwd root" to give root a password and then switch to that account to continue the instructions.

Second thing -- I need to install curl before these instructions. So: "sudo apt-get install curl".

Third thing -- "swapoff -a" to turn off swap. And editing it out of /etc/fstab.

The master runs through to completion. So, I copy the kubectl config files over as a normal user - as per the instructions, but then I install weave, because that is my preferred pod network instead of flannel ("kubectl apply -f https://git.io/weave-kube-1.6").

Joining the worker to the cluster also works without issue! Whoa! :-)

Only 2 changes on copying the kubectl config files over on the client is that...

1) I have to rename the source file to look for "kubelet.conf" instead of "admin.conf."

2) I have to run "sudo chmod 777 /var/lib/kubelet/pki/kubelet-client-current.pem" to give the kubectl on the worker access to the PEM file

At this point, I have a 2 node K8S cluster running... time to try Kubeflow .6!

Something that isn't currently covered in the Kubeflow .6 installation instructions is that you must first have Istio running on your cluster in order to do this installation. So, we start there.

  1. I download the Istio bits using the instructions here.

  2. I install the bits using the instructions here. (Generates tons of warnings about how kubectl should be used to create some resource on the first step.

There are official releases of .6.1 in the Releases portion of Github, so I start by downloading those onto my Mac Mini (the master in my Kubernetes cluster).

curl -L https://github.com/kubeflow/kubeflow/releases/download/v0.6.1/kfctl_v0.6.1_linux.tar.gz --output kfctl_v0.6.1_linux.tar.gz

tar xvzf kfctl_v0.6.1_linux.tar.gz

This gives me a single file: kfctl. I try to follow the instructions at https://www.kubeflow.org/docs/started/getting-started-k8s/#Kubeflow-for-Existing-Clusters---by-Arrikto to install on my cluster.

Unfortunately, everything blows up when I attempt to run the "kfctl apply" part of the process. The reason given is that you need to have a load balancer installed... specifically "istio-ingressgateway". Strangely, I have this installed, but Kubeflow isn't finding it.

So, I noticed that the instructions say a load balancer is required. Let me try installing a load balancer, per the instructions at https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/ and see how this works out.

I don't get very far before I realize that these instructions are telling me how to create a load balancer for a specific app already running on my cluster. Clearly when the instructions say "Load Balancer Support required," it means something else - not to clear to me what.

So, I dig around a little in the Kubeflow issues and discover this issue which leads me to this proposed document revision. The main difference in the new instructions appears to be that the "kfctl init" command uses a different configuration file, which will install Istio from scratch. I try this and, because I am already running Istio, it fails. I delete the istio namespace (recommended way to delete Istio) and it gives another error. I restart both machines in my cluster and retry all the Kubeflow install steps from scratch and... it runs to completion! But will it actually work?

A couple of minutes later, some pods still list themselves as Pending or ContainerCreating...

Seven minutes later, I am starting to lose faith, so I describe one of the pods. It is listed as still pulling the relevant image, so -- fine. I should say that Comcast just "upgraded" (read: told me my slower, cheaper plan was no longer available, so I had to move to a more expensive, albeit faster one) my connection to something like 350 MB this past week, so - these must be some massive images.

After about 8 minutes, everything appears to have finished pulling images and about 90% of everything seems to be running, but after watching it for another 2 minutes, I've seen the remaining 10% of the pods cycles through various stages of health, crashing and restarting. Here is what they look like at the moment...

15 minutes afterwards, they are all running. So, I guess the moral to this part of the story is - don't lose patience during the startup - they seem to achieve a state of functionality eventually, left to their own devices. Some of them still periodically show crashes, but I guess I'll move forward and see what works and what doesn't.

Additional note: at this point, "kubectl get pods -n kubeflow" is taking much longer to respond. Previously, it was instantaneously - now it is taking about 30 seconds.

But now - can I actually make a connection, and to where should I connect? There's no more Ambassador against which I should proxy a connection, presumably because of the Istio.

The instructions direct me to this article, which gives me a command I'm supposed to be able to run to find out how my services have been exposed for access. I run the recommended commands...

And open a browser against the host and port assigned, and... voila - it lives!! :-)

I punch around for a bit and things look OK until I get to the pipelines, which have an error.

I decide to try uploading the Fashion MNIST test pipeline that I described in my previous blog post. This fails with the same error. It would seem that something additional needs to be configured with the Istio proxy, perhaps?

To be continued!

 
 
 

Yorumlar


  • Facebook
  • Twitter
  • LinkedIn

©2018 by Machine Learning for Non-Mathematicians. Proudly created with Wix.com

bottom of page