YOLO (You Solely Look As soon as) has been a number one real-time object detection framework, with every iteration enhancing upon the earlier variations. The most recent model YOLO v12 introduces developments that considerably improve accuracy whereas sustaining real-time processing speeds. This text explores the important thing improvements in YOLO v12, highlighting the way it surpasses the earlier variations whereas minimizing computational prices with out compromising detection effectivity.
What’s New in YOLO v12?
Beforehand, YOLO fashions relied on Convolutional Neural Networks (CNNs) for object detection resulting from their velocity and effectivity. Nevertheless, YOLO v12 makes use of consideration mechanisms, an idea extensively recognized and utilized in Transformer fashions which permit it to acknowledge patterns extra successfully. Whereas consideration mechanisms have initially been sluggish for real-time object detection, YOLO v12 in some way efficiently integrates them whereas sustaining YOLO’s velocity, resulting in an Consideration-Centric YOLO framework.
Key Enhancements Over Earlier Variations
1. Consideration-Centric Framework
YOLO v12 combines the ability of consideration mechanisms with CNNs, leading to a mannequin that’s each quicker and extra correct. Not like its predecessors which relied solely on CNNs, YOLO v12 introduces optimized consideration modules to enhance object recognition with out including pointless latency.
2. Superior Efficiency Metrics
Evaluating efficiency metrics throughout completely different YOLO variations and real-time detection fashions reveals that YOLO v12 achieves increased accuracy whereas sustaining low latency.
- The mAP (Imply Common Precision) values on datasets like COCO present YOLO v12 outperforming YOLO v11 and YOLO v10 whereas sustaining comparable velocity.
- The mannequin achieves a outstanding 40.6% accuracy (mAP) whereas processing photos in simply 1.64 milliseconds on an Nvidia T4 GPU. This efficiency is superior to YOLO v10 and YOLO v11 with out sacrificing velocity.
3. Outperforming Non-YOLO Fashions
YOLO v12 surpasses earlier YOLO variations; it additionally outperforms different real-time object detection frameworks, equivalent to RT-Det and RT-Det v2. These various fashions have increased latency but fail to match YOLO v12’s accuracy.
Computational Effectivity Enhancements
One of many main considerations with integrating consideration mechanisms into YOLO fashions was their excessive computational price (Consideration Mechanism) and reminiscence inefficiency. YOLO v12 addresses these points by way of a number of key improvements:
1. Flash Consideration for Reminiscence Effectivity
Conventional consideration mechanisms devour a considerable amount of reminiscence, making them impractical for real-time purposes. YOLO v12 introduces Flash Consideration, a method that reduces reminiscence consumption and hurries up inference time.
2. Space Consideration for Decrease Computation Value
To additional optimize effectivity, YOLO v12 employs Space Consideration, which focuses solely on related areas of a picture as a substitute of processing the complete function map. This method dramatically reduces computation prices whereas retaining accuracy.
3. R-ELAN for Optimized Function Processing
YOLO v12 additionally introduces R-ELAN (Re-Engineered ELAN), which optimizes function propagation making the mannequin extra environment friendly in dealing with complicated object detection duties with out rising computational calls for.
YOLO v12 Mannequin Variants
YOLO v12 is available in 5 completely different variants, catering to completely different purposes:
- N (Nano) & S (Small): Designed for real-time purposes the place velocity is essential.
- M (Medium): Balances accuracy and velocity, appropriate for general-purpose duties.
- L (Giant) & XL (Additional Giant): Optimized for high-precision duties the place accuracy is prioritized over velocity.
Additionally learn:
Let’s examine YOLO v11 and YOLO v12 Fashions
We’ll be experimenting with YOLO v11 and YOLO v12 small fashions to know their efficiency throughout varied duties like object counting, heatmaps, and velocity estimation.
1. Object Counting
YOLO v11
import cv2
from ultralytics import options
cap = cv2.VideoCapture("freeway.mp4")
assert cap.isOpened(), "Error studying video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))
# Outline area factors
region_points = [(20, 1500), (1080, 1500), (1080, 1460), (20, 1460)] # Decrease rectangle area counting
# Video author (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Init ObjectCounter
counter = options.ObjectCounter(
present=False, # Disable inside window show
area=region_points,
mannequin="yolo11s.pt",
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
im0 = counter.depend(im0)
# Resize to suit display (non-obligatory — scale down for big movies)
im0_resized = cv2.resize(im0, (640, 360)) # Alter decision as wanted
# Present the resized body
cv2.imshow("Object Counting", im0_resized)
video_writer.write(im0)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
YOLO v12
import cv2
from ultralytics import options
cap = cv2.VideoCapture("freeway.mp4")
assert cap.isOpened(), "Error studying video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))
# Outline area factors
region_points = [(20, 1500), (1080, 1500), (1080, 1460), (20, 1460)] # Decrease rectangle area counting
# Video author (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Init ObjectCounter
counter = options.ObjectCounter(
present=False, # Disable inside window show
area=region_points,
mannequin="yolo12s.pt",
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
im0 = counter.depend(im0)
# Resize to suit display (non-obligatory — scale down for big movies)
im0_resized = cv2.resize(im0, (640, 360)) # Alter decision as wanted
# Present the resized body
cv2.imshow("Object Counting", im0_resized)
video_writer.write(im0)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
2. Heatmaps
YOLO v11
import cv2
from ultralytics import options
cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error studying video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
# Video author
video_writer = cv2.VideoWriter("heatmap_output_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# In case you wish to apply object counting + heatmaps, you may go area factors.
# region_points = [(20, 400), (1080, 400)] # Outline line factors
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360)] # Outline area factors
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)] # Outline polygon factors
# Init heatmap
heatmap = options.Heatmap(
present=True, # Show the output
mannequin="yolo11s.pt", # Path to the YOLO11 mannequin file
colormap=cv2.COLORMAP_PARULA, # Colormap of heatmap
# area=region_points, # If you wish to do object counting with heatmaps, you may go region_points
# lessons=[0, 2], # If you wish to generate heatmap for particular lessons i.e individual and automobile.
# show_in=True, # Show in counts
# show_out=True, # Show out counts
# line_width=2, # Alter the road width for bounding containers and textual content show
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
im0 = heatmap.generate_heatmap(im0)
im0_resized = cv2.resize(im0, (w, h))
video_writer.write(im0_resized)
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
YOLO v12
import cv2
from ultralytics import options
cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error studying video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))
# Video author
video_writer = cv2.VideoWriter("heatmap_output_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# In case you wish to apply object counting + heatmaps, you may go area factors.
# region_points = [(20, 400), (1080, 400)] # Outline line factors
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360)] # Outline area factors
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)] # Outline polygon factors
# Init heatmap
heatmap = options.Heatmap(
present=True, # Show the output
mannequin="yolo12s.pt", # Path to the YOLO11 mannequin file
colormap=cv2.COLORMAP_PARULA, # Colormap of heatmap
# area=region_points, # If you wish to do object counting with heatmaps, you may go region_points
# lessons=[0, 2], # If you wish to generate heatmap for particular lessons i.e individual and automobile.
# show_in=True, # Show in counts
# show_out=True, # Show out counts
# line_width=2, # Alter the road width for bounding containers and textual content show
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
im0 = heatmap.generate_heatmap(im0)
im0_resized = cv2.resize(im0, (w, h))
video_writer.write(im0_resized)
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
3. Velocity Estimation
YOLO v11
import cv2
from ultralytics import options
import numpy as np
cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error studying video file"
# Seize video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Video author
video_writer = cv2.VideoWriter("speed_management_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Outline velocity area factors (regulate on your video decision)
speed_region = [(300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270)]
# Initialize SpeedEstimator
velocity = options.SpeedEstimator(
present=False, # Disable inside window show
mannequin="yolo11s.pt", # Path to the YOLO mannequin file
area=speed_region, # Cross area factors
# lessons=[0, 2], # Non-compulsory: Filter particular object lessons (e.g., automobiles, vehicles)
# line_width=2, # Non-compulsory: Alter the road width
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
# Estimate velocity and draw bounding containers
out = velocity.estimate_speed(im0)
# Draw the velocity area on the body
cv2.polylines(out, [np.array(speed_region)], isClosed=True, coloration=(0, 255, 0), thickness=2)
# Resize the body to suit the display
im0_resized = cv2.resize(out, (1280, 720)) # Resize for higher display match
# Present the resized body
cv2.imshow("Velocity Estimation", im0_resized)
video_writer.write(out)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
YOLO v12
import cv2
from ultralytics import options
import numpy as np
cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error studying video file"
# Seize video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Video author
video_writer = cv2.VideoWriter("speed_management_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
# Outline velocity area factors (regulate on your video decision)
speed_region = [(300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270)]
# Initialize SpeedEstimator
velocity = options.SpeedEstimator(
present=False, # Disable inside window show
mannequin="yolo12s.pt", # Path to the YOLO mannequin file
area=speed_region, # Cross area factors
# lessons=[0, 2], # Non-compulsory: Filter particular object lessons (e.g., automobiles, vehicles)
# line_width=2, # Non-compulsory: Alter the road width
)
# Course of video
whereas cap.isOpened():
success, im0 = cap.learn()
if not success:
print("Video body is empty or video processing has been efficiently accomplished.")
break
# Estimate velocity and draw bounding containers
out = velocity.estimate_speed(im0)
# Draw the velocity area on the body
cv2.polylines(out, [np.array(speed_region)], isClosed=True, coloration=(0, 255, 0), thickness=2)
# Resize the body to suit the display
im0_resized = cv2.resize(out, (1280, 720)) # Resize for higher display match
# Present the resized body
cv2.imshow("Velocity Estimation", im0_resized)
video_writer.write(out)
# Press 'q' to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.launch()
video_writer.launch()
cv2.destroyAllWindows()
Output
Additionally Learn: High 30+ Pc Imaginative and prescient Fashions For 2025
Skilled Opinions on YOLOv11 and YOLOv12
Muhammad Rizwan Munawar — Pc Imaginative and prescient Engineer at Ultralytics
“YOLOv12 introduces flash consideration, which reinforces accuracy, however it requires cautious CUDA setup. It’s a strong step ahead, particularly for complicated detection duties, although YOLOv11 stays quicker for real-time wants. Briefly, select YOLOv12 for accuracy and YOLOv11 for velocity.”
Linkedin Submit – Is YOLOv12 actually a state-of-the-art mannequin? 🤪
Muhammad Rizwan, not too long ago examined YOLOv11 and YOLOv12 aspect by aspect to interrupt down their real-world efficiency. His findings spotlight the trade-offs between the 2 fashions:
- Frames Per Second (FPS): YOLOv11 maintains a mean of 40 FPS, whereas YOLOv12 lags behind at 30 FPS. This makes YOLOv11 the higher alternative for real-time purposes the place velocity is crucial, equivalent to site visitors monitoring or reside video feeds.
- Coaching Time: YOLOv12 takes about 20% longer to coach than YOLOv11. On a small dataset with 130 coaching photos and 43 validation photos, YOLOv11 accomplished coaching in 0.009 hours, whereas YOLOv12 wanted 0.011 hours. Whereas this might sound minor for small datasets, the distinction turns into vital for larger-scale initiatives.
- Accuracy: Each fashions achieved related accuracy after fine-tuning for 10 epochs on the identical dataset. YOLOv12 didn’t dramatically outperform YOLOv11 when it comes to accuracy, suggesting the newer mannequin’s enhancements lie extra in architectural enhancements than uncooked detection precision.
- Flash Consideration: YOLOv12 introduces flash consideration, a robust mechanism that hurries up and optimizes consideration layers. Nevertheless, there’s a catch — this function isn’t natively supported on the CPU, and enabling it with CUDA requires cautious version-specific setup. For groups with out highly effective GPUs or these engaged on edge units, this may turn into a roadblock.
The PC specs used for testing:
- GPU: NVIDIA RTX 3050
- CPU: Intel Core-i5-10400 @2.90GHz
- RAM: 64 GB
The mannequin specs:
- Mannequin = YOLO11n.pt and YOLOv12n.pt
- Picture measurement = 640 for inference
Conclusion
YOLO v12 marks a big leap ahead in real-time object detection, combining CNN velocity with Transformer-like consideration mechanisms. With improved accuracy, decrease computational prices, and a spread of mannequin variants, YOLO v12 is poised to redefine the panorama of real-time imaginative and prescient purposes. Whether or not for autonomous autos, safety surveillance, or medical imaging, YOLO v12 units a brand new normal for real-time object detection effectivity.
What’s Subsequent?
- YOLO v13 Potentialities: Will future variations push the eye mechanisms even additional?
- Edge Machine Optimization: Can Flash Consideration or Space Consideration be optimized for lower-power units?
That can assist you higher perceive the variations, I’ve connected some code snippets and output leads to the comparability part. These examples illustrate how each YOLOv11 and YOLOv12 carry out in real-world eventualities, from object counting to hurry estimation and heatmaps. I’m excited to see the way you guys understand this new launch! Are the enhancements in accuracy and a focus mechanisms sufficient to justify the trade-offs in velocity? Or do you assume YOLOv11 nonetheless holds its floor for many purposes?