Overview

The ComfyStream toolkit adds powerful real-time video and audio capabilities to ComfyUI, making it easy to build interactive, AI-powered media workflows. It extends ComfyUI with specialized tools for streaming, live processing, and on-the-fly workflow updates, including:

  • ComfyStream – A custom node that streams audio and video from your webcam and microphone into ComfyUI for real-time AI processing, then returns the processed output.
  • ComfyUI-Stream-Pack – A collection of custom nodes designed to support advanced real-time audio and video workflows.

To help you get started, ComfyStream also includes a knowledge base with foundational workflows, optimization tips, and in-depth guides on all tools and nodes.

Getting Started

To get started, follow the installation guide below or explore the stream pack for additional nodes.

How It Works

ComfyStream enables real-time processing of audio and video streams by integrating a WebRTC server for low-latency, bidirectional communication, a custom tensor-based pipeline for converting media frames to and from tensors, and ComfyUI’s EmbeddedComfyClient for AI inference.

Data Flow Overview

Here’s how the system processes live audio and video end-to-end:

  1. Input: WebRTC receives video, audio, and control data from the client.
  2. Workflow Injection: The pipeline dynamically modifies the ComfyUI workflow by replacing standard input/output nodes with custom tensor nodes.
  3. Inference: The EmbeddedComfyClient processes incoming tensors in real-time using the updated workflow.
  4. Output Conversion: Processed tensors are converted back to video and audio, and streamed back to the client via WebRTC.
  5. Live Control: A control channel allows the client to update the workflow or modify parameters on the fly, without restarting the session.

This high-level overview is visualized below: