Mastering Picture and Video Segmentation with SAM 2 -

Introduction

This information will stroll you thru what Section Something Mannequin 2 is, the way it works, and the way you’ll put it to use to portion objects in footage and movies. It gives state-of-the-art execution and flexibility in fragmenting objects into footage, making it an essential useful resource for a assortment of pc imaginative and prescient purposes. This straight factors to supplying a nitty-gritty, step-by-step walkthrough for establishing and using SAM 2 to carry out image division. By taking this direct, it is possible for you to to provide division covers for footage using each field and level prompts.

Studying Goals

Describe the important thing options and purposes of the Section Something Mannequin 2 SAM 2 in picture and video segmentation.
Efficiently configure a CUDA-enabled surroundings, set up essential dependencies, and clone the Section Something Mannequin 2 repository for picture segmentation duties.
Apply SAM 2 to generate segmentation masks for pictures utilizing each field and level prompts and visualize the outcomes successfully.
Consider how SAM 2 can revolutionize photograph and video enhancing by enabling real-time segmentation, automating advanced duties, and democratizing content material creation for a broader viewers.

This text was printed as part of the Knowledge Science Blogathon.

Stipulations

A while just lately you start, assure you’ve obtained a CUDA-enabled GPU for faster dealing with. Additionally, confirm that you’ve got Python put in in your machine. This information assumes you have got some primary information of Python and picture processing ideas.

What’s SAM 2?

Section Something Mannequin 2 is an progressed instrument for image division created by Fb AI Inquire about (Cheap). On July twenty ninth, 2024, Meta AI discharged SAM 2, an progressed image and video division institution present. SAM 2 empowers purchasers to produce focuses or packing containers in an image or video to create division covers for explicit objects.

Click on right here to entry it.

Key Options of SAM 2

Superior Masks Era: SAM 2 generates high-quality segmentation masks primarily based on person inputs, reminiscent of factors or bounding packing containers.
Flexibility: The mannequin helps each picture and video segmentation.
Velocity and Effectivity: With CUDA help, SAM 2 can carry out segmentation duties quickly, making it appropriate for real-time purposes.

Core Parts of SAM 2

Picture Encoder: Encodes the enter picture for processing.
Immediate Encoder: Converts user-provided factors or packing containers right into a format the mannequin can use.
Masks Decoder: Generates the ultimate segmentation masks primarily based on the encoded inputs.

Functions of SAM 2

Allow us to now look into the purposes of SAM 2 under:

Picture and Video Modifying: SAM 2 permits for exact object segmentation, enabling detailed edits and artistic results in images and movies.
Autonomous Autos: In autonomous driving, SAM 2 can be utilized to determine and monitor objects like pedestrians, automobiles, and highway indicators in real-time.
Medical Imaging: SAM 2 can help in segmenting anatomical buildings in medical pictures, aiding in diagnostics and therapy planning.

What’s Picture Segmentation?

Picture segmentation is a pc imaginative and prescient method that entails dividing a picture into a number of segments or areas to simplify its evaluation. Every section represents a unique object or a part of an object inside the picture, making it simpler to determine and analyze particular parts.

Kinds of Picture Segmentation

Semantic Segmentation: Classifies every pixel right into a predefined class.
Occasion Segmentation: Differentiates between totally different cases of the identical object class.
Panoptic Segmentation: Combines semantic and occasion segmentation.

Setting Up and Using SAM 2 for Picture Segmentation

We’ll information you thru the method of establishing the Section Something Mannequin 2 (SAM 2) in your surroundings and using its highly effective capabilities for exact picture segmentation duties. From guaranteeing your GPU is able to configuring the mannequin and making use of it to actual pictures, every step will likely be coated intimately that can assist you harness the total potential of SAM 2.

Step 1: Test GPU Availability and Set Up the Setting

First, let’s be sure that your surroundings is correctly arrange, beginning with checking for GPU availability and setting the present working listing.

# Test GPU availability and CUDA model
!nvidia-smi
!nvcc --version

# Import essential modules
import os

# Set the present working listing
HOME = os.getcwd()
print("HOME:", HOME)

Rationalization

!nvidia-smi and !nvcc –model: These instructions examine in case your framework incorporates a CUDA-enabled GPU and present the CUDA type.
os.getcwd(): This work will get the present working catalog, which may be utilized for overseeing file methods.

Step 2: Clone the SAM 2 Repository and Set up Dependencies

Subsequent, we have to clone the SAM 2 repository from GitHub and set up the required dependencies.

# Clone the SAM 2 repository
!git clone https://github.com/facebookresearch/segment-anything-2.git

# Change to the repository listing
%cd segment-anything-2

# Set up the SAM 2 bundle
!pip set up -e .

# Set up further packages
!pip set up supervision jupyter_bbox_widget

Rationalization

!git clone: Clones the SAM 2 repository to your native machine.
%cd: Adjustments the listing to the cloned repository.
!pip set up -e .: Installs the SAM 2 bundle in editable mode.
!pip set up supervision jupyter_bbox_widget: Installs further packages required for visualization and bounding field widget help.

Step 3: Obtain Mannequin Checkpoints

Mannequin checkpoints are important, as they comprise the educated parameters of SAM 2. We’ll obtain a number of checkpoints for various mannequin sizes.

# Create a listing for checkpoints
!mkdir -p checkpoints

# Obtain the mannequin checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P checkpoints

Rationalization

!mkdir -p checkpoints: Creates a listing for storing mannequin checkpoints.
!wget -q … -P checkpoints: Downloads the mannequin checkpoints into the checkpoints listing. Totally different checkpoints symbolize fashions of various sizes and capabilities.

Step 4: Obtain Pattern Photographs

For demonstration functions, we’ll use some pattern pictures. You too can use your pictures by following related steps.

# Create a listing for knowledge
!mkdir -p knowledge

# Obtain pattern pictures
!wget -q https://media.roboflow.com/notebooks/examples/canine.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P knowledge
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P knowledge

Rationalization

!mkdir -p knowledge: Creates a listing for storing pattern pictures.
!wget -q … -P knowledge: Downloads the pattern pictures into the information listing.

Step 5: Set Up the SAM 2 Mannequin and Load an Picture

Now, we’ll arrange the SAM 2 mannequin, load a picture, and put together it for segmentation.

import cv2
import torch
import numpy as np
import supervision as sv

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator

# Allow CUDA if accessible
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

if torch.cuda.get_device_properties(0).main >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

# Set the gadget to CUDA
DEVICE = torch.gadget('cuda' if torch.cuda.is_available() else 'cpu')

# Outline the mannequin checkpoint and configuration
CHECKPOINT = "checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"

# Construct the SAM 2 mannequin
sam2_model = build_sam2(CONFIG, CHECKPOINT, gadget=DEVICE, apply_postprocessing=False)

# Create the automated masks generator
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)

# Load a picture for segmentation
IMAGE_PATH = "/content material/WhatsApp Picture 2024-08-02 at 14.17.11_2b223e01.jpg"
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Generate segmentation masks
sam2_result = mask_generator.generate(image_rgb)

Rationalization

CUDA Setup: Permits CUDA for quicker processing and units the gadget to GPU if accessible.
Mannequin Setup: Builds the SAM 2 mannequin utilizing the required configuration and checkpoint.
Picture Loading: Hundreds and converts the pattern picture to RGB format.
Masks Era: Makes use of the automated masks generator to generate segmentation masks for the loaded picture.

Step 6: Visualize the Segmentation Masks

We’ll now visualize the segmentation masks generated by SAM 2.

# Annotate the masks on the picture
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam2_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the unique and segmented pictures aspect by aspect
sv.plot_images_grid(
    pictures=[image_bgr, annotated_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

# Extract and plot particular person masks
masks = [
    mask['segmentation']
    for masks in sorted(sam2_result, key=lambda x: x['area'], reverse=True)
]

sv.plot_images_grid(
    pictures=masks[:16],
    grid_size=(4, 4),
    dimension=(12, 12)
)

Rationalization:

Masks Annotation: Annotates the segmentation masks on the unique picture.
Visualization: Plots the unique and segmented pictures aspect by aspect and likewise plots particular person masks.

Step7: Use Field Prompts for Segmentation

Field prompts enable us to specify areas of curiosity within the picture for segmentation.

# Outline the SAM 2 Picture Predictor
predictor = SAM2ImagePredictor(sam2_model)

# Reload the picture
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

# Encode the picture for bounding field enter
import base64

def encode_image(filepath):
    with open(filepath, 'rb') as f:
        image_bytes = f.learn()
    encoded = str(base64.b64encode(image_bytes), 'utf-8')
    return "knowledge:picture/jpg;base64,"+encoded

# Allow customized widget supervisor in Colab
IS_COLAB = True

if IS_COLAB:
    from google.colab import output
    output.enable_custom_widget_manager()

from jupyter_bbox_widget import BBoxWidget

# Create a bounding field widget
widget = BBoxWidget()
widget.picture = encode_image(IMAGE_PATH)

# Show the widget
widget

Rationalization

Picture Predictor: Defines the SAM 2 picture predictor.
Picture Encoding: Encodes the picture to be used with the bounding field widget.
Widget Setup: Units up a bounding field widget for specifying areas of curiosity.

Step8: Get Bounding Bins and Carry out Segmentation

After specifying the bounding packing containers, we will use them to generate segmentation masks.

# Get the bounding packing containers from the widget
packing containers = widget.bboxes
packing containers = np.array([
    [
        box['x'],
        field['y'],
        field['x'] + field['width'],
        field['y'] + field['height']
    ] for field in packing containers
])

[{'x': 457, 'y': 341, 'width': 0, 'height': 0, 'label': ''},
 {'x': 205, 'y': 79, 'width': 0, 'height': 1, 'label': ''}]

# Set the picture within the predictor
predictor.set_image(image_rgb)

# Generate masks utilizing the bounding packing containers
masks, scores, logits = predictor.predict(
    field=packing containers,
    multimask_output=False
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
box_annotator = sv.BoxAnnotator(colour=sv.Shade.white())
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=packing containers,
    masks=masks.astype(bool)
)

source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated pictures
sv.plot_images_grid(
    pictures=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Get Bounding Boxes and Perform Segmentation

Rationalization

Bounding Bins: Retrieves the bounding packing containers specified utilizing the widget.
Masks Era: Makes use of the bounding packing containers to generate segmentation masks.
Visualization: Annotates and visualizes the masks on the unique picture.

Step9: Use Level Prompts for Segmentation

Level prompts enable us to specify particular person factors of curiosity for segmentation.

# Create level prompts primarily based on bounding packing containers
input_point = np.array([
    [
        box['x'] + (field['width'] // 2),
        field['y'] + (field['height'] // 2)
    ] for field in widget.bboxes
])
input_label = np.array([1] * len(input_point))

# Generate masks utilizing the purpose prompts
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# Convert masks to binary format
masks = np.squeeze(masks)

# Annotate and visualize the masks
point_annotator = sv.PointAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=sv.mask_to_xyxy(masks=masks),
    masks=masks.astype(bool)
)

source_image = point_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Plot the annotated pictures
sv.plot_images_grid(
    pictures=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

Rationalization

Level Prompts: Creates level prompts primarily based on the bounding packing containers.
Masks Era: Makes use of the purpose prompts to generate segmentation masks.
Visualization: Annotates and visualizes the masks on the unique picture.

Key Factors to Keep in mind When Working SAM 2

Allow us to now look into few essential key factors under:

Revolutionizing Picture and Video Modifying

Potential to remodel the photograph and video enhancing business.
Future enhancements could embrace improved precision, decrease computational necessities, and superior AI integration.

Actual-Time Segmentation and Modifying

Evolution may result in real-time segmentation and enhancing capabilities.
Permits seamless alterations in movies and pictures with minimal effort.

Artistic Prospects for All

Opens up new artistic potentialities for each professionals and amateurs.
Simplifies the manipulation of visible content material, the creation of beautiful results, and the manufacturing of high-quality media.

Automating Advanced Duties

Automates intricate segmentation duties.
Considerably accelerates workflows, making subtle enhancing extra accessible and environment friendly.

Democratizing Content material Creation

Makes high-level enhancing instruments accessible to a broader viewers.
Empowers storytellers and conjures up innovation throughout numerous sectors, together with leisure, promoting, and schooling.

Affect on VFX Business

Enhances visible results (VFX) manufacturing by streamlining advanced processes.
Reduces the effort and time required for creating intricate VFX, enabling extra bold initiatives and enhancing general high quality.

Spectacular Potential of SAM 2

The Section Something Mannequin 2 (SAM 2) stands poised to revolutionize the fields of photograph and video enhancing by introducing important developments in precision and computational effectivity. By integrating superior AI capabilities, SAM 2 will allow extra intuitive person interactions and real-time segmentation and enhancing, permitting seamless alterations with minimal effort. This groundbreaking know-how guarantees to democratize content material creation, empowering each professionals and amateurs to control visible content material, create beautiful results, and produce high-quality media with ease.

As SAM 2 automates advanced segmentation duties, it can speed up workflows and make subtle enhancing accessible to a wider viewers. This transformation will encourage innovation throughout numerous industries, from leisure and promoting to schooling. Within the realm of visible results (VFX), SAM 2 will streamline intricate processes, lowering the effort and time wanted to create elaborate VFX. It will allow extra bold initiatives, elevate the standard of visible storytelling, and open up new artistic potentialities within the VFX world.

Conclusion

By following this information, you have got realized the way to arrange and use the Section Something Mannequin 2 (SAM 2) for picture segmentation utilizing each field and level prompts. SAM 2 supplies highly effective and versatile instruments for segmenting objects in pictures, making it a useful asset for numerous pc imaginative and prescient duties. Be at liberty to experiment together with your pictures and discover the capabilities of SAM 2 additional.

Key Takeaways

SAM 2 is a complicated device developed by Meta AI that permits exact and versatile picture and video segmentation utilizing each field and level prompts.
The mannequin can considerably improve photograph and video enhancing by automating advanced segmentation duties, making it extra accessible and environment friendly.
Organising SAM 2 requires a CUDA-enabled GPU and a primary understanding of Python and picture processing ideas.
SAM 2’s capabilities open new potentialities for each professionals and amateurs in content material creation, providing real-time segmentation and artistic management.
The mannequin has the potential to remodel numerous industries, together with visible results, leisure, promoting, and schooling, by democratizing high-level enhancing instruments.

Often Requested Questions

Q1. What’s SAM 2?

A. SAM 2, or Part Something Present 2, is a image and video division present created by Meta AI that allows purchasers to provide division covers for explicit objects by giving field or level prompts.

Q2. What are the stipulations for using SAM 2?

A. To make use of SAM 2, you want a CUDA-enabled GPU for quicker processing and Python put in in your machine. Fundamental information of Python and picture processing ideas can also be useful.

Q3. How do I arrange SAM 2?

A. Arrange SAM 2 by checking GPU availability, cloning the SAM 2 repository from GitHub, putting in required dependencies, and downloading mannequin checkpoints and pattern pictures for testing.

This autumn. What sorts of prompts can be utilized with SAM 2 for segmentation?

A. SAM 2 helps each field prompts and level prompts. Field prompts contain specifying areas of curiosity utilizing bounding packing containers, whereas level prompts contain deciding on particular factors within the picture.

Q5. How can SAM 2 affect photograph and video enhancing?

A. SAM 2 can revolutionize photograph and video altering by mechanizing advanced division assignments, empowering real-time altering, and making superior altering apparatuses accessible to a broader gathering of individuals, on this method enhancing imaginative conceivable outcomes and workflow proficiency.

The media proven on this article will not be owned by Analytics Vidhya and is used on the Writer’s discretion.