Deep Learning — VGGNet

VGGNet Architectures

3 min readJul 31, 2024

VGGNet is a convolutional neural network (CNN) architecture that was developed by the Visual Geometry Group (VGG) at the University of Oxford in 2014. The architecture is notable for its use of very small convolutional filters, as well as its very deep structure, with up to 19 layers. The network was trained on the ImageNet dataset, which contains over 1 million images, and was able to achieve a top-5 error rate of 7.3%, which was among the best performance at that time.

The main innovation of VGGNet is its use of very small convolutional filters, with a filter size of 3x3. This is in contrast to larger filter sizes, such as the 11x11 filter size used in the AlexNet architecture, which was the state-of-the-art at the time. The use of smaller filters allows for a deeper architecture without increasing the number of parameters or computation too much.

Another important aspect of VGGNet is the concept of stacking several convolutional layers with small filter sizes, as well as max-pooling layers, without using any dense layers. This increases the capacity of the model and allows it to learn more complex features. The architecture of VGGNet consists of several of these blocks stacked together. Each block is composed of two or three convolutional layers, followed by a max-pooling layer. The output of these blocks is then passed to one or two fully connected layers, followed by a softmax layer for classification.

The VGGNet architecture comes in different variations, such as VGG-11, VGG-13, VGG-16 and VGG-19, which differs in the number of layers and the number of filters on each convolutional layer.

VGGNet was a key milestone in the development of deep learning for computer vision, and its innovations in the use of small filter sizes and very deep architectures helped to pave the way for future advances in the field, such as the ResNet architecture. The architecture of VGGNet has also been widely used as a feature extractor for other tasks, such as object detection, semantic segmentation, and fine-grained classification.

In summary, VGGNet is a convolutional neural network (CNN) architecture that was developed by the Visual Geometry Group (VGG) at the University of Oxford in 2014. It is notable for its use of very small convolutional filters and its very deep structure, with up to 19 layers. VGGNet was trained on the ImageNet dataset and was able to achieve a top-5 error rate of 7.3%. It was a key milestone in the development of deep learning for computer vision and its innovations in the use of small filter sizes and very deep architectures helped to pave the way for future advances in the field.

Traditional classifiers-disadvantage

If the selected features lack the representation required to distinguish the categories the accuracy of the classification model suffers a lot, irrespective of the type of classification strategy

Solution: CNNs

It combines the feature extraction and classification modules into one integrated system and it learns to extract, by discriminating representations from the images, and classifying them based on supervised data.

Popular CNN architectures –

AlexNet [2012]
VGGNet [2014]
GoogleNet [2014]
ResNet [2015]

CNN Architectures

VGG 16

With a given receptive field (the effective area size of the input image on which output depends), multiple stacked smaller size kernel is better than the one with a larger size kernel
Multiple non-linear layers increase the depth of the network
Enables it to learn more complex features, and that too at a lower cost.
VGG Net — 7.7% error at ILSVRC 2014 competition (1st Runner up)