Skip to content

Object Detection

Object detection nodes let you identify, locate, and segment objects in images. Use loader nodes to configure each model’s weights, then connect them to the corresponding detector or segmentation node.





Inputs

None

Widgets

  • Model Name: Select a Florence-2 checkpoint to load.

Outputs

  • Florence-2 Model: Model object passed to Florence-2 Object Detection.

Downloads and loads Microsoft Florence-2 model weights. The loaded model is passed directly to the Florence-2 Object Detection node. Florence-2 is a universal vision model capable of detection, captioning, OCR, and more — all driven by a text prompt.

  • Directorymodels
    • Directoryflorence2
      • florence-2-base-ft
      • florence-2-large-ft




Florence-2 Object Detection Detector

Section titled “Florence-2 Object Detection ”

Inputs

  • Image: Input image.
  • Model: Florence-2 Model from Load Florence-2 Model node.

Widgets

  • Text Prompt: Comma-separated object classes or caption request.
  • Task: The vision task to perform.

Outputs

  • BBoxes: Bounding box coordinates.
  • Labels: Detected class labels or captions.
  • Image(s): Annotated output image.

A universal vision node powered by Microsoft Florence-2. Unlike YOLOX which is limited to 80 COCO classes, Florence-2 is open-vocabulary — you describe what you want to detect in plain text and it finds it. It also supports captioning, OCR, and dense region proposals.

WidgetTypeDescription
Text PromptTextComma-separated object names (e.g., person, car) for detection, or a free-form query for captioning/OCR tasks.
TaskDropdownThe vision task to execute. See task options below.
TaskDescription
Object DetectionDetect objects matching the text prompt.
Phrase GroundingGround a natural language phrase to image regions.
Open Vocabulary DetectionDetect any concept described in text.
CaptionGenerate a short caption for the whole image.
Detailed CaptionGenerate a longer, more descriptive caption.
More Detailed CaptionMaximum detail caption.
Dense Region CaptionCaption every detected region individually.
Region ProposalReturn object-like regions without labels.
OCR With RegionExtract text and its bounding box location.




Inputs

None

Widgets

  • Model: Select a SAM2 model size.

Outputs

  • SAM2 Model: Model object passed to SAM2 Segmentation.

Loads the weights for Segment Anything Model 2 (SAM2). Connect the output to the SAM2 Segmentation node.

  • Directorymodels
    • Directorysam2
      • sam2_hiera_tiny.pt
      • sam2_hiera_small.pt
      • sam2_hiera_large.pt




SAM2 Segmentation Detector

Section titled “SAM2 Segmentation ”

Inputs

  • Image: Input image.
  • SAM2 Model: From Load SAM2 Model node.
  • points: (Optional) Point prompts.

Widgets

  • confidence_threshold: Accuracy filter.

Outputs

  • Masks: Segmentation masks.

Segment Anything Model 2 (SAM2). The state-of-the-art model for “cutting out” objects from images. Can be guided with point prompts or run automatically.





Inputs

None

Widgets

  • model_name: Model size selector.

Outputs

  • Model: YOLOX Model object.

Loads the weights for the YOLOX object detection architecture. Connect the output to the YOLOX Object Detection node.

WidgetTypeDescription
model_nameDropdownnano (fastest) to x (most accurate).
  • Directorymodels
    • Directoryyolox
      • yolox_s.pth
      • yolox_x.pth




YOLOX Object Detection Detector

Section titled “YOLOX Object Detection ”

Inputs

  • Model: YOLOX Model.
  • Image(s): Input Stream.

Widgets

  • Confidence: Min score (0.0-1.0).
  • NMS: Overlap removal.
  • Line/Text Color: Visualization colors.
  • Show Labels/Scores: Toggle overlays.

Outputs

  • BBoxes: Bounding Boxes.
  • Labels: Class names (e.g. “person”).
  • Scores: Confidence % (0-1).
WidgetTypeDefaultDescription
confidence_thresholdFloat0.5Minimum certainty required to detect an object.
nms_thresholdFloat0.45Non-Maximum Suppression. Higher values merge overlapping boxes less aggressively.
box_colorColor#00FF00Color of the bounding box lines.
label_colorColor#000000Color of the label text.
show_labelsBoolTrueDraw class names (e.g. “person”) on the image.
show_scoresBoolTrueDraw confidence percentages on the image.

Detects 80 different classes of objects (COCO dataset) in an image, returning their bounding boxes and labels.

YOLOX Detection Example