return Sign in

Deep learning with Spark and TensorFlow

Deep learning with Spark and TensorFlow

In the past few years, the neural networks have seen spectacular progress, and they are now the most powerful in the field of image recognition and automatic translation. TensorFlow is a new framework for the numerical calculation and neural network released by Google. In this blog, we will demonstrate how to use TensorFlow and Spark to train and apply the deep learning model together.

You may be confused: in the highest performance of the deep learning implementation is a single node of the moment, where the use of Spark? In order to answer this question, we will demonstrate two examples and explain how to use the Spark and machine cluster collocation TensorFlow to improve deep learning's pipeline number.

  1. Hyper parameter tuning: using Spark to find the best training parameter set for the neural network, reducing the training time by ten times and reducing the error rate by 34%.
  2. Deployment model scale: neural network model using Spark in a large number of data on the application of training.

Super parameter adjustment

One example of advanced machine learning (ML) technique is the artificial neural network. They take a complex input, such as an image or audio record, and then apply the complex mathematical transformation of these signals. The output of this transform is more easily manipulated by other ML algorithms for the digital vector. The artificial neural network performs the transformation by simulating the visual cortex of the human brain (in a rather simplified form).

As humans learn to interpret what they see, artificial neural networks need to be trained to recognize specific patterns of "fun". For example, these can be simple patterns, such as edges, circles, but also can be more complex patterns. Here, we will use NIST to provide the classic data sets to train neural networks to identify these figures:


TensorFlow library automatically create a variety of shape and size of the neural network training algorithm. The actual process of building a neural network is much more complex than running on a data set. Usually there are some very important hyper parameters (popularly said, parameter configuration) to set up, which will affect how the model is trained. Choosing the right parameters will lead to superior performance, while bad parameters will result in long training and poor performance. In practice, machine learning practitioners will use different parameters to run the same model many times, in order to find the best set. This is a classic technique called super parameter adjustment.

When building a neural network, there are many important parameters that need to be carefully selected. For example:

  • The number of neurons in each layer: too few neurons will reduce the expression of the network, but too much will greatly increase the running time, and return to the noise estimation.
  • Learning speed: if too high, the neural network will focus only on the past few samples, and regardless of the experience accumulated in the past. If it is too low, it will take a long time to reach a very good state.

The interesting thing here is that, even if the TensorFlow itself is not distributed, the super parameter adjustment process is "parallel embarrassingly", and can use the Spark distribution. In this case, we can use Spark to broadcast common elements, such as data and model description, and then to fault tolerance to arrange for the individual to repeat the calculation of cluster.


How to use Spark to improve accuracy? By defaultHyper parameter settingAccuracy of 99.2%. Our best result on the test set is 99.47% accuracy, a 34% reduction in the test error. Distributed computing time and added to the linear relationship: 13 nodes of the cluster, we can at the same time, we must foster the 13 models, compared to in a machine a then a speed training enhance the seven times the number of nodes in the cluster. Here is the calculation time (in seconds) of the number of machines with respect to the cluster:


Most importantly, we analyze the sensitivity of the hyper parameters in a large number of training processes. For example, we draw the final test performance map with respect to the different number of neurons in the learning rate:


This shows a typical neural network tradeoff curve:

  • Learning speed is very critical: if it is too low, the neural network does not learn anything (high test error). If it is too high, the training process may take place at random and even in some configurations.
  • The number of neurons is not that important for achieving good performance, and the learning rate of more neurons in the network is more sensitive. This is the principle of Occam's razor: for most goals, a simple model is often "good enough"". If you have the time and resources to get rid of the missing 1% test error, put a lot of resources into the training, and find the right super parameters, which will be different.

By using sparse samples, we can obtain the zero error rate under the optimal parameter set.

How do I use it?

Although TensorFlow can use all the cores on each worker, we can only run one task at the same time for each worker, and we pack them in order to limit competition. TensorFlow library can be in accordance with on the TensorFlow website] [instructions (Https:// on the Spark cluster as an ordinary Python library. The following note shows how to install TensorFlow and allow users to repeat the experiment in this article:

Large scale deployment

TensorFlow model can be directly embedded in the pipeline to perform complex recognition tasks on the data set. As an example, we will show how we can use aA neural network model has been trained to complete the stockMark a set of pictures

First, the model is distributed to the worker in the cluster using the Spark built in broadcast mechanism:

Gfile.FastGFile with ('classify_image_graph_def.pb','rb') f: as
Model_data = ()
Model_data_bc = sc.broadcast (model_data)

After that, the model is loaded onto each node and applied to the image. This is the code framework for each node to run:

Apply_batch def (image_url):
A new TensorFlow graph # Creates of computation and imports the model
Tf.Graph.As_default () () as () g: with
Graph_def = tf.GraphDef ()
Graph_def.ParseFromString (model_data_bc.value)
Tf.import_graph_def (graph_def, name='')

Loads the image data from the # URL:
Image_data = urllib.request.urlopen (img_url, timeout=1.0).Read ()

Runs a tensor flow session that # loads the
Tf.Session as () sess: with
Softmax_tensor = sess.graph.get_tensor_by_name ('softmax:0')
Predictions = (softmax_tensor, {'DecodeJpeg/contents:0': image_data})
Predictions return

By packing the pictures together, the code can run faster.
Here is a sample of the picture:


This is the neural network for the image of the explanation, quite accurate:

(Reef''coral, 0.88503921),
(diver''scuba, 0.025853464),
(coral''brain, 0.0090828091),
('snorkel', 0.0036010914),
('promontory, headland, head, foreland', 0.0022605944)])


We have demonstrated how to combine Spark and TensorFlow to train and deploy a neural network for handwritten digit recognition and image classification. Although we use the neural network framework itself can only be run in a single node, but we can use the Spark distribution of ultra parameter adjustment process and model deployment. This not only reduces the training time, but also improves the accuracy, so that we can better understand the sensitivity of various Super parameters.

Although this support is only available for Python, we are still looking forward to providing a deeper integration between the TensorFlow and Spark other frameworks.

Original text:Https://