Overview

ComfyStream is a custom node that adds powerful real-time video and audio capabilities to ComfyUI, making it easy to build interactive, AI-powered media workflows. It extends ComfyUI with specialized tools for streaming, live processing, and on-the-fly workflow updates, including:
  • ComfyStream – A custom node that streams audio and video from your webcam and microphone into ComfyUI for real-time AI processing, then returns the processed output.
  • ComfyUI-Stream-Pack – A collection of custom nodes designed to support advanced real-time audio and video workflows.

Getting Started

To get started, follow the installation guide below or explore the stream pack for additional nodes.

How It Works

ComfyStream enables real-time processing of audio and video streams by integrating a WebRTC server for low-latency, bidirectional communication, a custom tensor-based pipeline for converting media frames to and from tensors, and ComfyUI’s EmbeddedComfyClient for AI inference.

Data Flow Overview

Here’s how the system processes live audio and video end-to-end:
  1. Input: WebRTC receives video, audio, and control data from the client.
  2. Workflow Injection: The pipeline dynamically modifies the ComfyUI workflow by replacing standard input/output nodes with custom tensor nodes.
  3. Inference: The EmbeddedComfyClient processes incoming tensors in real-time using the updated workflow.
  4. Output Conversion: Processed tensors are converted back to video and audio, and streamed back to the client via WebRTC.
  5. Live Control: A control channel allows the client to update the workflow or modify parameters on the fly, without restarting the session.
This high-level overview is visualized below: