The Decision-Makers Of Neural Networks
I, Rushi Prajapati, Welcome you, to my another blog in my “Simplifying Series”, in which I’m trying to explain complex topics by simplifying them. In this series, I’ve written three blogs: one on computer vision, one on ML-DL, and one on neural networks. And today I’m presenting you another blog about neural network activation function.
I’m assuming you’ve read my last blog on neural networks because this one is about activation function, thus I’m hoping you’re familiar with the fundamentals of neural networks.
Just as our brains generate electrical signals to transmit and process information, neural networks rely on activation functions to determine weather a node should ‘fire’ the output or not.
Think of Activation Function as the person who triggers the neural network’s “decision-making” process. Activation functions can be thought of as the “spark” that ignites and drives the neural network forward.
They introduce non-linearity into the network, allowing it to learn and represent complex relationships between inputs and outputs.
Activation functions are like an “on-off” switch, meaning they make a neuron either fully active or completely inactive.
Without the activation function, a deep learning model would be a linear combination of inputs and weights, which severely limits its ability to learn and model complex patterns in the data.
In simple terms, without the activation function, the output will be the same as the input with some weights added, and the decision-making capability of the neural network will not be there.
Flow Of An Activation Function
- In the neural network, each neuron computes a weighted sum of its inputs, which is passed through an activation function to produce output.
- Then the activation function determines whether the neuron should “fire” and produce an output based on the input it receives or not. Activation functions are the mathematical functions applied to the weighted sum (linear combination) from the network’s or layer’s output.
It’s important to note that different activation functions have different mathematical properties and characteristics. They can affect the network’s ability to learn. Therefore, the choice of activation function is crucial and should be carefully considered based on the problem you have.
Here are some popular activation functions:
- Step function
- Sigmoid function
- SoftMax fucntion
- Hyperbolic tangent function (tanh)
- Rectified Linear Unit (ReLU)
- Leaky ReLU
So let’s Understand some Activation functions..
Step Function
The simplest activation function used by the perceptron (neural network) is the step function that produce a binary output (“1” or “0”).
It basically says that if summed “input≥0” it fires and summed “input≤0” it doesn’t fire the output.
It is also called “Heaviside Step function” or “Unit Step Function” which is useful for binary classification.
The Step Activation Function finds great utility in binary classification tasks. It can be employed to separate input data into two distinct classes based on a decision boundary defined by the threshold.
Sigmoid Function
One of the most common activation function used in deep learning is sigmoid function. It is also called Logistic Function.
It is often used in binary classification to predict the probability of the class from two classes.
Sigmoid function converts infinite continuous variable (range -∞ to +∞) into simple probabilities “0” or “1”.
When step function is used to produce a discrete answer (pass or fail), Sigmoid function is used to produce the probability of passing and probability of the failing.
SoftMax Function
SoftMax function is a generalization of a sigmoid function. It is used to obtain classification probabilities when we have more than two classes.
It forces the output of neural network to “sum to 1” (0<output<1)
The SoftMax function transforms the input values into Probability values between 0 and 1.
SoftMax is the go-to function that you will often use at the output layer of a classifier when you are working on a problem where you need to predict a class between more than two classes.
Even when classifying two classes, SoftMax performs well; in short, it behaves like a sigmoid function.
Hyperbolic tangent function (tanh)
The hyperbolic tangent function is a sifted version of the sigmoid function. Instead of squeezing the signal values between 0 and 1, Tanh squizzes all the values into the range -1 to 1.
The tanh function in the hidden layer almost outperforms the sigmoid function because it centres your data so that the “mean” is closer to “0” than “0.5,” which makes learning bit easier for the next layer.
Rectified Linear Unit (ReLU)
At the time of writing, relu is considered a state-of-the-art activation function because it works well in many different situations and tends to train better than sigmoid and in the hidden layers.
Rectified linear unit activation function activate a node only if the input is above “0”.
If the input is below “0” then the output will always be “0” and when the input is higher than “0”, It will have a linear relationship with the variable.
The ReLU function eliminates all the negative values in the input by transforming them into zeros.
Leaky ReLU
Before understanding Leaky ReLU, let’s understand ReLU’s disadvantage. One disadvantage of ReLU is that the derivative is equal to zero when (x) is negative.
Leaky ReLU is the variation of ReLU that mitigates this issue. Instead of having the function that gives “zero” when (x<0), Leaky ReLU introduces a small negative slope (around 0.01) when (x) is negative.
It enables neurons to continue learning even when their output is negative, providing a more robust and adaptive behavior in neural networks.
Choosing the Right Activation Function
As mentioned earlier, the choice of activation function is crucial and should be carefully considered based on the problem you have. Different activation functions have different characteristics that can affect the network’s learning ability, convergence speed, and representation power.
- Non-linearity: Choose an activation function that allows the network to understand complex patterns in the data, not just straight lines.
- Differentiability: Make sure the activation function is differentiable, which helps with the network’s learning process and updating its knowledge effectively.
- Output range: Consider the requirements of your problem. If you’re doing binary classification, you’ll want an activation function that provides probabilities between 0 and 1.
- Avoiding gradient issues: Some activation functions can cause problems with gradients during training. Look for activation functions like ReLU that help prevent these issues, making learning easier for the network.
- Sparsity: Activation functions like ReLU can introduce sparsity by making some neurons inactive. This can be helpful in reducing computation and making the network more efficient.
By keeping these simple considerations in mind, you can choose an activation function that suits your problem and improves the network’s ability to learn and make accurate predictions.
Conclusion
In this blog, we explored several popular activation functions, including the step function, sigmoid function, SoftMax function, hyperbolic tangent function (tanh), Rectified Linear Unit (ReLU), and Leaky ReLU. Each of these functions has its own characteristics and mathematical properties that can impact the performance and learning capabilities of the neural network.
Activation functions are just one piece of the puzzle in building successful neural networks, but understanding their role and properties is a crucial step towards mastering deep learning.
I hope this blog provided you with a simplified understanding of activation functions and their significance in neural networks.
Keep an eye out for more blogs in the “Simplifying Series.”