Difference between revisions of "Distributed Tensorflow in Kubernetes"
From ESS-WIKI
Line 15: | Line 15: | ||
== Steps == | == Steps == | ||
− | 1. Create(Download) source & Dockerfile [[File:Iris train and eval.zip|RTENOTITLE]] and unzip to the same folder. | + | 1. Create(Download) source & Dockerfile ([[File:Iris train and eval.zip|RTENOTITLE]]) and unzip to the same folder. |
2. Create training container, where "ecgwc" is the username in dockerhub and "tf-iris:dist" is the container name | 2. Create training container, where "ecgwc" is the username in dockerhub and "tf-iris:dist" is the container name | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
$ docker build -t ecgwc/tf-iris:dist . | $ docker build -t ecgwc/tf-iris:dist . | ||
− | </syntaxhighlight> | + | </syntaxhighlight>3. Check if trainig docker is workable.<syntaxhighlight lang="bash"> |
− | 3. Check if trainig docker is workable. | ||
− | <syntaxhighlight lang="bash"> | ||
$ docker run --rm ecgwc/tf-iris:dist | $ docker run --rm ecgwc/tf-iris:dist | ||
− | </syntaxhighlight>[[File:Dist tf k8s-1.png|RTENOTITLE]] | + | </syntaxhighlight>[[File:Dist tf k8s-1.png|RTENOTITLE]] 4. Push docker to dockerHub<syntaxhighlight lang="bash"> |
− | 4. Push docker to dockerHub | ||
− | <syntaxhighlight lang="bash"> | ||
$ docker push ecgwc/tf-iris:dist | $ docker push ecgwc/tf-iris:dist | ||
− | </syntaxhighlight> | + | </syntaxhighlight> 5. Create(Download) yaml file for distributed tensorflow: [[File:Tf-dist-iris.zip|RTENOTITLE]] |
− | 5. Create(Download) yaml file for distributed tensorflow: [[File:Tf-dist-iris.zip|RTENOTITLE]] | ||
− | |||
6. Deploy yaml to k8s | 6. Deploy yaml to k8s | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
$ kubectl create -f tf-dist-iris.yaml | $ kubectl create -f tf-dist-iris.yaml | ||
− | </syntaxhighlight> | + | </syntaxhighlight> 7. Check training process |
− | 7. Check training process | ||
− | |||
[[File:Dist tf k8s-2.png|RTENOTITLE]] | [[File:Dist tf k8s-2.png|RTENOTITLE]] | ||
Revision as of 01:41, 16 November 2018
Contents
Introduce
Distributed Tensorflow (Clustering) can speed up your training. Distributed tensorflow in kubernates make it easy to:
- Add k8s nodes to extend computing capability
- Simplify the work to make a distributed tensorflow
This topic will describe how to make a distributed tensorflow.
Prerequisite
- You must know the basic concept of distributed tensorflow here: Distributed TensorFlow
- You must know how to write a distributed tensorflow training. Ex: train_and_evaluate
Steps
1. Create(Download) source & Dockerfile (File:Iris train and eval.zip) and unzip to the same folder.
2. Create training container, where "ecgwc" is the username in dockerhub and "tf-iris:dist" is the container name
$ docker build -t ecgwc/tf-iris:dist .
$ docker run --rm ecgwc/tf-iris:dist
$ docker push ecgwc/tf-iris:dist
6. Deploy yaml to k8s
$ kubectl create -f tf-dist-iris.yaml
Reference
https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow