Object Detection

Object Detection Nodes

Object detection nodes let you identify, locate, and segment objects in images. Use loader nodes to configure each model’s weights, then connect them to the corresponding detector or segmentation node.

Load Florence-2 Model Loader

Inputs

None

Widgets

Model Name: Select a Florence-2 checkpoint to load.

Outputs

Florence-2 Model: Model object passed to Florence-2 Object Detection.

Description

Downloads and loads Microsoft Florence-2 model weights. The loaded model is passed directly to the Florence-2 Object Detection node. Florence-2 is a universal vision model capable of detection, captioning, OCR, and more — all driven by a text prompt.

Managing Files

Directorymodels
- Directoryflorence2
  - florence-2-base-ft
  - florence-2-large-ft

Florence-2 Object Detection Detector

Inputs

Image: Input image.
Model: Florence-2 Model from Load Florence-2 Model node.

Widgets

Text Prompt: Comma-separated object classes or caption request.
Task: The vision task to perform.

Outputs

BBoxes: Bounding box coordinates.
Labels: Detected class labels or captions.
Image(s): Annotated output image.

Description

A universal vision node powered by Microsoft Florence-2. Unlike YOLOX which is limited to 80 COCO classes, Florence-2 is open-vocabulary — you describe what you want to detect in plain text and it finds it. It also supports captioning, OCR, and dense region proposals.

Parameters

Widget	Type	Description
Text Prompt	`Text`	Comma-separated object names (e.g., `person, car`) for detection, or a free-form query for captioning/OCR tasks.
Task	`Dropdown`	The vision task to execute. See task options below.

Task Options

Task	Description
Object Detection	Detect objects matching the text prompt.
Phrase Grounding	Ground a natural language phrase to image regions.
Open Vocabulary Detection	Detect any concept described in text.
Caption	Generate a short caption for the whole image.
Detailed Caption	Generate a longer, more descriptive caption.
More Detailed Caption	Maximum detail caption.
Dense Region Caption	Caption every detected region individually.
Region Proposal	Return object-like regions without labels.
OCR With Region	Extract text and its bounding box location.

Load SAM2 Model Loader

Inputs

None

Widgets

Model: Select a SAM2 model size.

Outputs

SAM2 Model: Model object passed to SAM2 Segmentation.

Description

Loads the weights for Segment Anything Model 2 (SAM2). Connect the output to the SAM2 Segmentation node.

Managing Files

Directorymodels
- Directorysam2
  - sam2_hiera_tiny.pt
  - sam2_hiera_small.pt
  - sam2_hiera_large.pt

SAM2 Segmentation Detector

Inputs

Image: Input image.
SAM2 Model: From Load SAM2 Model node.
points: (Optional) Point prompts.

Widgets

confidence_threshold: Accuracy filter.

Outputs

Masks: Segmentation masks.

Description

Segment Anything Model 2 (SAM2). The state-of-the-art model for “cutting out” objects from images. Can be guided with point prompts or run automatically.

Load YOLOX Model Loader

Inputs

None

Widgets

model_name: Model size selector.

Outputs

Model: YOLOX Model object.

Description

Loads the weights for the YOLOX object detection architecture. Connect the output to the YOLOX Object Detection node.

Parameters

Widget	Type	Description
model_name	`Dropdown`	`nano` (fastest) to `x` (most accurate).

Managing Files

Directorymodels
- Directoryyolox
  - yolox_s.pth
  - yolox_x.pth

YOLOX Object Detection Detector

Inputs

Model: YOLOX Model.
Image(s): Input Stream.

Widgets

Confidence: Min score (0.0-1.0).
NMS: Overlap removal.
Line/Text Color: Visualization colors.
Show Labels/Scores: Toggle overlays.

Outputs

BBoxes: Bounding Boxes.
Labels: Class names (e.g. “person”).
Scores: Confidence % (0-1).

Parameters

Widget	Type	Default	Description
confidence_threshold	`Float`	0.5	Minimum certainty required to detect an object.
nms_threshold	`Float`	0.45	Non-Maximum Suppression. Higher values merge overlapping boxes less aggressively.
box_color	`Color`	`#00FF00`	Color of the bounding box lines.
label_color	`Color`	`#000000`	Color of the label text.
show_labels	`Bool`	`True`	Draw class names (e.g. “person”) on the image.
show_scores	`Bool`	`True`	Draw confidence percentages on the image.

Description

Detects 80 different classes of objects (COCO dataset) in an image, returning their bounding boxes and labels.

YOLOX Detection Example