A Guide to YOLOv8 in 2024

Matthew Lee
Aug 6, 2024
5 min read

OPEN SOURCE & FREE-TO-USE

YOLOv8 is the newest model in the YOLO algorithm series – the most well-known family of object detection and classification models in the Computer Vision (CV) field. With the latest version, the YOLO legacy lives on by providing state-of-the-art results for image or video analytics, with an easy-to-implement framework. In this article, we’ll discuss: The evolution of the YOLO algorithms Improvements and enhancements in YOLOv8 Implementation details and tips Applications.

What is YOLO You Only Look Once (YOLO) is an object-detection algorithm introduced in 2015 in a research paper by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. YOLO’s architecture was a significant revolution in the real-time object detection space, surpassing its predecessor – the Region-based Convolutional Neural Network (R-CNN).

YOLO is a single-shot algorithm that directly classifies an object in a single pass by having only one neural network predict bounding boxes and class probabilities using a full image as input. The family YOLO model is continuously evolving. Several research teams have since released different YOLO versions, with YOLOv8 being the latest iteration. The following section briefly overviews all the historical versions and their improvements.

A Brief History of YOLO

Before discussing YOLO’s evolution, let’s look at some basics of how a typical object detection algorithm works. The diagram below illustrates the essential mechanics of an object detection model.

The architecture consists of a backbone, neck, and head. The backbone is a pre-trained Convolutional Neural Network (CNN) that extracts low, medium, and high-level feature maps from an input image. The neck merges these feature maps using path aggregation blocks like the Feature Pyramid Network (FPN). It passes them onto the head, classifying objects and predicting bounding boxes. The head can consist of one-stage or dense prediction models, such as YOLO or Single-shot Detector (SSD). Alternatively, it can feature two-stage or sparse prediction algorithms like the R-CNN series.

YOLOv1

As mentioned, YOLO is a single-shot detection model that improved upon the standard R-CNN detection mechanism with faster and better generalization performance. The real change was how YOLOv1 framed the detection problem as a regression task to predict bounding boxes and class probabilities from a single pass of an image.

YOLO divides an image into multiple grids and computes confidence scores and bounding boxes for each grid cell that reflect the probability of an object located within a particular grid cell. Next, given the probability of an object being greater than zero, the algorithm computes respective class probabilities and multiplies them with the object probabilities to generate an overall probability score and bounding box.

With this architecture, YOLOv1 surpassed R-CNN with a mean average precision (mAP) of 63.4 and an inference speed of 45 frames per second (FPS) on the open source Pascal Visual Object Classes 2007 dataset.

YOLOv2

In 2016, Joseph Redmon and Ali Farhadi released YOLOv2, which could detect over 9000 object categories. YOLOv2 introduced anchor boxes – predefined bounding boxes called priors that the model uses to pin down the ideal position of an object. The algorithm computes the Intersection over Union (IoU) scores for a predicted bounding box against an anchor box. If the IOU reaches a threshold, the model generates a prediction. YOLOv2 achieved 76.8 mAP at 67 FPS on the VOC 2007 dataset.

YOLOv3

Joseph Redmon and Ali Farhadi published another paper in 2018 to release YOLOv3 that boasted higher accuracy than previous versions, with an mAP of 28.2 at 22 milliseconds. To predict classes, the YOLOv3 model uses Darknet-53 as the backbone with logistic classifiers instead of softmax and Binary Cross-entropy (BCE) loss.

YOLOv4

In 2020, Alexey Bochkovskiy and other researchers released YOLOv4, which introduced the concept of a Bag of Freebies (BoF) and a Bag of Specials (BoS). BoF is a group of techniques that increase accuracy at no additional inference cost. In contrast, BoS methods enhance accuracy significantly for a slight increase in inference cost. BoF included CutMix, CutOut, Mixup data augmentation techniques, and a new Mosaic method. Mosaic augmentation mixes four different training images to provide the model with better context information. BoS methods have features like non-linear activations and skip connections. The model achieved 43.5 mAP at approximately 65 FPS on the MS COCO dataset.

YOLOv5

Without an official research paper, Ultralytics released YOLOv5 in June 2020, two months after the launch of YOLOv4. The model is easy to train and use since it is a PyTorch implementation. The architecture uses a Cross-stage Partial (CSP) Connection block as the backbone for a better gradient flow to reduce computational cost. Also, YOLOv5 uses the Yet Another Markup Language (YAML) files instead of the CFG file that includes model configurations. Since YOLOv5 lacks an official research paper, no authentic results exist to compare its performance with previous versions and other object detection models.

YOLOv6

YOLOv6 is another unofficial version of the YOLO series introduced in 2022 by Meituan – a Chinese shopping platform. The company targeted the model for industrial applications with better performance than its predecessor. The significant differences include anchor-free detection and a decoupled head, which means one head performs classification. In contrast, the other conducts regression to predict bounding box coordinates. The changes resulted in YOLOv6(nano) achieving an mAP of 37.5 at 1187 FPS on the COCO dataset and YOLOv6(small) achieving 45 mAP at 484 FPS.

YOLOv7

In July 2022, a group of researchers released the open source model YOLOv7, the fastest and the most accurate object detector with an mAP of 56.8% at FPS ranging from 5 to 160. Extended Efficient Layer Aggregation Network (E-ELAN) forms the backbone of YOLOv7, which improves training by letting the model learn diverse features with efficient computation. Also, the model uses compound scaling for concatenation-based models to address the need for different inference speeds.

YOLOv8

We finally come to Ultralytics YOLOv8, the latest YOLO version released in January 2023. Like v5 and v6, YOLOv8 has no official paper but boasts higher accuracy and faster speed. For instance, the YOLOv8(medium) has a 50.2 mAP score at 1.83 milliseconds on the COCO dataset and A100 TensorRT. YOLO v8 also features a Python package and CLI-based implementation, making it easy to use and develop. Let’s look closely at what the YOLOv8 can do and explore a few of its significant developments.

Pre trained model YOLO v8 is capable to detect objects in an image or live video

YOLOv8 Tasks

YOLOv8 comes in five variants based on the number of parameters – nano(n), small(s), medium(m), large(l), and extra large(x). You can use all the variants for classification, object detection, and segmentation.

Image Classification

Classification involves categorizing an entire image without localizing the object present within the image. You can implement classification with YOLOv8 by adding the -cls suffix to the YOLOv8 version. For example, you can use yolov8n-cls.pt for classification if you wish to use the nano version.

Object Detection

Object detection localizes an object within an image by drawing bounding boxes. You don’t have to add any suffix to use YOLOv8 for detection. The implementation only requires you to define the model as yolov8n.pt for object detection with the nano variant.

Image Segmentation

Image segmentation goes a step further and identifies each pixel belonging to an object. Unlike object detection, segmentation is more precise in locating different objects within a single image. You can add the -seg suffix as yolov8n-seg.pt to implement segmentation with the YOLOv8 nano variant.

YOLOv8 Major Developments

The main features of YOLOv8 include mosaic data augmentation, anchor-free detection, a C2f module, a decoupled head, and a modified loss function.

For more details, head over to https://viso.ai/deep-learning/yolov8-guide/# for more info.