Scale

Training and serving modern artificial intelligence models achieved through very specialized software used for optimizing transparently these processes. Among the many the most important are TensorFlow and Pytorch, the latter based on Torch.

Both of these frameworks address the need to easily train neural networks of arbitrary architecture. In the training phase, i.e. when the model is continuously modifying its parameters to better fit the patterns in the data, a number of very complex computations need to be performed. Every time the architecture would be modified the code would need to be rewritten and optimized. The machine learning frameworks are removing this need by automatically creating the best sequence of computations. Additionally, frameworks offer the capability to perform tensor, i.e. a multi-dimensional matrix , computations and scale computations on CPUs and GPUs across server clusters.

Pytorch

PyTorch is an open-source machine learning library based on the Torch library. It was developed primarily by Facebook's artificial-intelligence research group.

The history of PyTorch can be traced back to 2002 when the Torch library was first released as a software package and supported lua. Due to the unpopularity of Lua as a programming language in the machine learning community a Python oriented version was created. PyTorch was first released in October 2016 and it quickly gained popularity due to its ease of use and powerful features.

PyTorch is known for its two high-level features: tensor computing (like NumPy) with strong acceleration via graphics processing units (GPU), and deep neural networks built on a tape-based autograd system. This allows for dynamic computation graphs, a feature that makes PyTorch particularly suited to deep learning.

Example neural network in Pytorch.

Below is a code to create a feedforward neural network with one hidden layer:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 10)  # 10 input features, 10 output features
        self.fc2 = nn.Linear(10, 1)  # 10 input features, 1 output feature

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Torch is made available by "importing" it:

import torch
import torch.nn as nn
import torch.optim as optim

and the network can be simply used as:

net = Net()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

To train the network a few lines of simple code are needed:

for epoch in range(100):  # loop over the dataset multiple times
    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    # print statistics
    print('Epoch: %d, loss: %.3f' % (epoch + 1, loss.item()))

To use the trained network on new data, simply pass the new data through the network:

# Make sure to put your model in evaluation mode
net.eval()

# You also need to wrap your data in a Variable
# it is common that this step is not needed depending on the datasource
new_data = torch.autograd.Variable(new_data)

# Get the prediction
prediction = net(new_data)

# If you want the actual predicted value, you can get it by calling .item() on the tensor
predicted_value = prediction.item()

print(predicted_value)

Tensorflow

TensorFlow is an open-source software library developed by Google's Brain team for machine learning and artificial intelligence research. Originating around 2011, its predecessor, DistBelief, was a proprietary system based on deep learning neural networks. It could run on multiple CPUs or GPUs and even mobile operating systems, and it had several wrapper libraries to help developers at different levels of expertise. Google open-sourced TensorFlow under the Apache 2.0 license in November 2015, allowing developers worldwide to use the software library to develop AI and machine learning applications more easily.

Below is an example of a neural network using TensorFlow. A simple feed-forward neural network for MNIST digit classification. A simple 2-layer neural network is created and trained on the MNIST dataset.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Load MNIST data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
learning_rate = 0.1
num_steps = 500
batch_size = 128
display_step = 100

# Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}

# Create model
def neural_net(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Construct model
logits = neural_net(X)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

Its performance can be evaluated simply as:

    # Calculate accuracy for MNIST test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images,
                                      Y: mnist.test.labels}))

Keras

Unfortunately Tensorflow original way to create and build the neural network requires more boilerplate code. For that reason Keras was developed initially as a completely separate project but later or integrated into Tensorflow.

The previous example using Keras API is shorter and easier to understand as seen below:

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the input data
x_train, x_test = x_train / 255.0, x_test / 255.0

# Reshape the data
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Define the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adadelta(),
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train,
          batch_size=128,
          epochs=10,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate the model
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In this example the data is loaded and preprocessed, then the model is defined. The model is a simple convolutional neural network with two convolutional layers, a max pooling layer, and two dense layers. Finally, the model is then compiled and trained for 10 epochs.