Neural Networks

Neural networks are a set of learning algorithms, modeled after the human brain, that are designed to perform multiple non-linear transformations of the input such that patterns in the dataset lead to learning an appropriate model of the original data. The operate only with numerical data, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be encoded. There are many different way to encode real world input into numerical vectors, and using a good encoding technique can be a make or break for the capability to train a good model.

Neural Networks Architecture

Neural networks are composed of layers of nodes, or "neurons". Each layer's output is the subsequent layer's input. There are three main types of layers:

Input Layer: This is where the network receives input from the dataset. It's often called the visible layer, because it's the only part that is exposed to the data in the numerical encoded format.
Hidden Layer(s): These are layers after the input layer. Deep learning networks might have dozens up to a hundred of hidden layers. They might be organized in a feed forward manner, one after another have more complex forward or recursive interactions.
Output Layer: It is the final layer and its purpose is to translate the activations of the previous hidden layers into the output variables predictions. For instance, to guess whether an input image is a cat or a dog, the output layer might be a single node with a value between 0 and 1. Having values closer to 1 denoting the dog.

Neurons in a layer are connected to those in subsequent layers. Weights are applied to these connections, and biases are applied to each node. The weights and biases are learned by the neural network over time. The process is called training, and it involves taking a few records from the dataset and running them through the network to receive the network outputs. Measuring the error from the expected output allows us to make corrections to the weights in order to achieve smaller error in the next iteration. Doing this repetitively results in an ever improving process, allowing to find a good model. This process is the learning algorithm of the neural network.

Learning Process

The learning process of a neural network involves adjusting the weights and biases based on the error of the network's output. This is done through a process called backpropagation and an optimization algorithm like gradient descent.

Backpropagation: This is the primary algorithm for performing gradient descent on neural networks. It calculates the gradient of the error function with respect to the neural network's weights at each specific layer. Allowing us to adjust the weights using e.g. gradient descent.
Gradient Descent: This is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

Types of Neural Networks

There are many types of neural networks, each with its own strengths and weaknesses. Here are a few of the most important ones:

Feedforward Neural Networks (FNNs): These are the simplest type of artificial neural network. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes.
Convolutional Neural Networks (CNNs): These are mainly used for image processing, pattern recognition, and machine learning. CNNs are composed of one or more convolutional layers with fully connected layers (matching those in typical artificial neural networks) on top.
Recurrent Neural Networks (RNNs): These are used for applications that involve sequential data such as time series analysis, speech recognition, and machine translation. In RNNs, data can flow in any direction.
Long Short-Term Memory Networks (LSTMs): These are a special kind of RNN, capable of learning long-term dependencies, which makes them ideal for tasks that require learning from important experiences that happened many time steps in the past.

Problems and Challenges

Despite their power and flexibility, neural networks are not a silver bullet. They have several key challenges:

Overfitting: This occurs when a neural network learns the training data too well, to the point where it performs poorly on data it hasn't seen before.
Computational: Neural networks, especially deep ones, require significant computational resources, which can make them impractical for certain applications.
Explainability: Neural networks are often criticized for being "black boxes". It can be difficult to understand why a network is making a particular prediction. This in start comparison with e.g. simple logistic regression models where the feature in the dataset most affecting a certain outcome can be readily identified.
Data Requirements: Large amounts of data to train effectively are required, and it is correct to assume that data gathered/generated needs to be at an exponential amount to the number of features and the size of the network. In deep models this is particularly important. However, with the advent of pre-trained models a large barrier has been lifted and now it even easier to perform a process called fine-tuning instead of training.