Difference between revisions of "Distributed Tensorflow in Kubernetes"

From ESS-WIKI
Jump to: navigation, search
Line 20: Line 20:
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 
$ docker build -t ecgwc/tf-iris:dist .
 
$ docker build -t ecgwc/tf-iris:dist .
</syntaxhighlight>3. Check if trainig docker is workable.<syntaxhighlight lang="bash">
+
</syntaxhighlight>
 +
 
 +
3. Check if trainig docker is workable.<syntaxhighlight lang="bash">
 
$ docker run --rm ecgwc/tf-iris:dist
 
$ docker run --rm ecgwc/tf-iris:dist
</syntaxhighlight>[[File:Dist tf k8s-1.png|RTENOTITLE]] 4. Push docker to dockerHub<syntaxhighlight lang="bash">
+
</syntaxhighlight>
 +
 
 +
[[File:Dist tf k8s-1.png|RTENOTITLE]]  
 +
 
 +
4. Push docker to dockerHub<syntaxhighlight lang="bash">
 
$ docker push ecgwc/tf-iris:dist
 
$ docker push ecgwc/tf-iris:dist
</syntaxhighlight> 5. Create(Download) yaml file for distributed tensorflow: [[File:Tf-dist-iris.zip|RTENOTITLE]]
+
</syntaxhighlight>
6. Deploy yaml to k8s
+
 
<syntaxhighlight lang="bash">
+
5. Create(Download) yaml file for distributed tensorflow: [[File:Tf-dist-iris.zip|RTENOTITLE]]
 +
 
 +
6. Deploy yaml to k8s<syntaxhighlight lang="bash">
 
$ kubectl create -f tf-dist-iris.yaml
 
$ kubectl create -f tf-dist-iris.yaml
</syntaxhighlight> 7. Check training process
+
</syntaxhighlight>
 +
 
 +
7. Check training process
 +
 
 
[[File:Dist tf k8s-2.png|RTENOTITLE]]
 
[[File:Dist tf k8s-2.png|RTENOTITLE]]
  

Revision as of 03:14, 16 November 2018

Introduce

Distributed Tensorflow (Clustering) can speed up your training. Distributed tensorflow in kubernates make it easy to:

  1. Add k8s nodes to extend computing capability
  2. Simplify the work to make a distributed tensorflow

This topic will describe how to make a distributed tensorflow.

Prerequisite

  1. You must know the basic concept of distributed tensorflow here: Distributed TensorFlow
  2. You must know how to write a distributed tensorflow training. Ex: train_and_evaluate

Steps

1. Create(Download) source & Dockerfile (File:Iris train and eval.zip) and unzip to the same folder.

2. Create training container, where "ecgwc" is the username in dockerhub and "tf-iris:dist" is the container name

$ docker build -t ecgwc/tf-iris:dist .
3. Check if trainig docker is workable.
$ docker run --rm ecgwc/tf-iris:dist

RTENOTITLE

4. Push docker to dockerHub
$ docker push ecgwc/tf-iris:dist

5. Create(Download) yaml file for distributed tensorflow: File:Tf-dist-iris.zip

6. Deploy yaml to k8s
$ kubectl create -f tf-dist-iris.yaml

7. Check training process

RTENOTITLE

Reference

https://github.com/Azure/kubeflow-labs/tree/master/7-distributed-tensorflow