Distributed Tensorflow - [Intro]

in #deep-learning8 years ago

Resources #63.png


Matthew Rahtz, a Master's student in Neuroscience and Machine Learning in Switzerland, has posted a very detailed introduction to distributed TensorFlow on amid.fish website. As per Matthew:

"Distributed TensorFlow allows us to share parts of a TensorFlow graph between multiple processes, possibly each on a different machine." [source]

One of the reasons for which one might want to do this is to be able to benefit from the power of more than one machines during the training process, having the parameters shared between all machines.

Matthew does not delay too much with theory and starts by doing the implementation in TensorFlow. One of the key features that one needs to understand about distributed TensorFlow is that to share parameters between processes, one needs to link the execution engines (across multiple machines) together.

Thus:

  • for each process there will be a TensorFlow server (execution engine)
  • servers are linked together in clusters
  • each server in the cluster is known as a task
  • each task is associated with a job (a collection of related tasks)

In the post, Matthew goes on into explaining where the variables are placed, how graphs work with distributed TF, and he also shares some practical details that are important to be accounted for, such as:

  • what happens when a server leaves the cluster
  • what happens if it returns to the cluster
  • to whom falls the responsibility for variable initialization
  • and others.

So, if you want to learn more about this, you can read the entire post by Matthew as linked below or you could also read the official documentation for distributed TensorFlow:

Distributed Tensorflow - [Intro]


To stay in touch with me, follow @cristi


Cristi Vlad Self-Experimenter and Author

Coin Marketplace

STEEM 0.04
TRX 0.32
JST 0.082
BTC 60785.45
ETH 1557.47
USDT 1.00
SBD 0.47