Immersion: Real Meets AI

Experimental music video merging live-action with AI-animated characters while preserving performance fidelity

Immersion: Real Meets AI - Experimental Music Video thumbnail showing the blend of live action and AI animation

Project Overview

Following the innovative path laid by my Ex Machina AI rendering experiment, I undertook a new project to explore the evolving landscape of technology in the creative field. "Immersion: Real Meets AI" is an experimental music video that aims to increase performance fidelity in AI filmmaking by merging live-action footage with 2D animation, inspired by the style of "Who Framed Roger Rabbit."

The twist in this project was integrating AI to animate the 2D character, creating a unique blend of human and artificial creativity. This endeavor allowed me to develop and refine a new video-to-video (vid2vid) workflow, pushing the limits of how AI can stylize and render existing videos into animated sequences. The project showcased my compositing and VFX skills while allowing me to practice model fine-tuning to preserve the actress's likeness and expressivity.

What makes this project truly innovative is the character consistency I achieved by combining a trained LoRA on my actress with IP adapter technology, solving one of the most persistent challenges in AI-generated video. Through "Immersion: Real Meets AI," I aimed to bridge the gap between traditional filmmaking and cutting-edge AI technology, opening new avenues for creativity and storytelling in the digital age.

🎯

Goal

Create a new AI workflow that preserves character consistency and performance fidelity while merging live-action and animated elements.

⏱️

Timeline

4 months (August 2023 - December 2023), developed during USC's "Directing in a Virtual World" class.

🧠

Role

Creator, Director, VFX Artist, AI Engineer, Compositor, and Technical Director for the entire workflow.

🛠️

Tools & Technologies

ComfyUI, AnimateDiff, Green Screen, Stable Diffusion, LoRA Training, IP Adapter, After Effects, HDRIs, Video-to-Video Processing.

Challenge & Solution

The Challenge

Creating a believable interaction between live-action and AI-generated animated characters posed several significant technical and creative challenges:

  • Character Consistency: A common problem in AI video is maintaining consistent character appearance across frames and shots. Traditional methods result in significant frame-to-frame variations.
  • Preserving Performance: AI processing tends to flatten emotional expressivity and nuance from performances, resulting in lifeless characters.
  • Technical Complexity: Transitioning from Auto1111 to ComfyUI required developing entirely new workflows in a more complex but more powerful environment.
  • Filming Logistics: Directing actors to interact believably with non-existent characters required innovative approaches to blocking, tracking, and performance.
  • Compositing Challenges: Seamless integration between live-action and AI-generated elements required advanced keying and compositing techniques.

The Solution

I developed a comprehensive approach that addressed each challenge with innovative technical and creative solutions:

  • LoRA + IP Adapter Combination: I trained a custom LoRA model on my actress and combined it with IP adapter technology to achieve unprecedented character consistency across all shots.
  • Performance Preservation: Building on my previous research in facial expressivity preservation, I applied specialized techniques to maintain the emotional nuance of the original performance.
  • ComfyUI + AnimateDiff Workflow: I created a custom node-based workflow in ComfyUI that incorporated the newly released AnimateDiff technology, significantly improving results over my previous Auto1111 approach.
  • Novel Filming Techniques: I implemented a specialized laser dot tracking system on green screen that provided precise tracking markers without affecting the key, and used acrylic props for realistic interaction points.
  • Creative Compositing Solutions: I used AI to help improve chromakey results, implemented HDRI backgrounds with subtle movement, and created real reflections using projection techniques rather than compositing.
Three-way comparison video thumbnail

Three-way comparison: Original green screen footage (left), Ex Machina method (center), and new AnimateDiff-enhanced result (right)

Process & Methodology

This project evolved significantly throughout its development, as I adapted to rapidly changing AI technologies and refined my approach based on results. What began as an evolution of my Ex Machina workflow transformed into something entirely new as I incorporated cutting-edge tools like AnimateDiff and developed innovative filming techniques.

1

Concept Development & Story Evolution

The project began as an experiment for USC's "Directing in a Virtual World" class. My initial idea evolved from using the LED wall as a rough background with AI rendering for polish, to using it as a physical barrier between characters, and finally to a "Who Framed Roger Rabbit" inspired approach with green screen. Working with my actress, we developed a story about a demon from another dimension who crash-lands on Earth, allowing for organic interaction between the characters.

Early version of the storyboard for Immersion One of the first concept images generated to convey the idea

Left: Early storyboard version; Right: Initial concept image showing the visual style

2

Character Creation & Technical Development

I approached character creation by first exploring using a 3D metahuman of my actress that could be processed with AI. I also developed a novel ComfyUI workflow that incorporated nodes for character consistency and expressivity preservation. When AnimateDiff was released during production, I quickly adapted my workflow to incorporate this powerful new technology, significantly improving the quality of the animated character.

Original metahuman of the actress before AI processing Same metahuman after AI processing

Before/After: Metahuman of the actress before and after AI processing

LoRA Training Process

Sample training image with creative pose

Training data: Creative pose

Sample training image - front facing image of face

Training data: Front facing reference

Early result of LoRA training

Early LoRA result

Final version of LoRA with higher quality result

Final LoRA result with improved quality

LoRA training progression: From varied training data to increasingly refined results, showing how character consistency was achieved

Technical walkthrough of the ComfyUI workflow

3

Green Screen Filming & Actor Direction

I implemented a specialized green screen setup with laser dot tracking markers that provided precise tracking without affecting the green screen key. To capture realistic interactions, I used specific marks, C-stands, and a transparent piece of acrylic where actors could place their hands. I filmed the actors both together and separately for maximum flexibility, allowing for genuine interaction while still having clean isolated takes that could be processed individually.

Crew working on the green screen set

Crew working on the green screen set

4

Character Consistency & Advanced Compositing

For the crucial eye reflection scene, instead of using post-production compositing, I projected a pre-rendered idle "animation" of the demon girl onto a large screen and filmed the actor looking at that projection, creating a genuine reflection in his eye. I also trained a LoRA model on my actress's face and combined it with IP adapter technology to achieve unprecedented character consistency. The backgrounds were created using HDRIs with subtle AI-generated movement to enhance the sense of immersion.

Test of actress before processing Same test after processing

Before/after test showing transformation

Actress on set in front of green screen with laser tracking points Processed image showing AI-generated character

Left: Actress with tracking points; Right: Processed result showing character consistency

Results & Impact

The final result was a revolutionary approach to character animation that bridges traditional VFX techniques with cutting-edge AI technology. "Immersion: Real Meets AI" successfully demonstrated a new workflow for creating animated characters with consistent appearance and preserved performance fidelity, while substantially reducing the time and resources traditionally required.

Immersion: Real Meets AI - Final Video thumbnail

Final video: "Immersion: Real Meets AI" experimental music video

100%
Character Consistency
Achieved complete character consistency across all shots, addressing a major challenge in AI-generated video.
~70%
Time Reduction
Compared to traditional animation or rotoscoping methods for integrating 2D characters with live action.
3+
Innovative Techniques
Developed multiple new approaches to AI filmmaking, including the LoRA + IP adapter combination.

Qualitative Outcomes

Beyond the technical achievements, this project delivered several significant creative and professional results:

  • Professional Recognition: The project impressed industry professionals, including a Netflix director who was particularly interested in the character consistency technique.
  • Creative Storytelling: Successfully told an emotionally engaging story through the seamless integration of live-action and AI-animated characters.
  • Technical Innovation: Pioneered a new approach to character animation that combines the expressivity of human performance with the visual flexibility of AI generation.
  • Future Applications: Established a foundation for a new breed of CG characters that maintain real performances without requiring motion capture volumes, markers, or suits.
Closeup of actress face used for IP adapter

Actress closeup used for IP adapter reference

3D scan of the actress used for metahuman in early tests, and as training data

Technical Deep Dive

The technological breakthrough in this project came from the unique combination of several cutting-edge AI techniques and traditional filmmaking methods. Below is a detailed exploration of the key technical innovations that made this project possible.

ComfyUI Workflow Architecture

Unlike my previous Ex Machina project which used Auto1111, this project utilized ComfyUI's node-based approach to create a more flexible and powerful workflow. The workflow incorporated:

  • ControlNet Integration: Multiple ControlNet nodes were used to maintain structural consistency while allowing for stylistic transformation
  • IP Adapter: This technology allowed for reference-based generation, helping maintain the likeness of the actress
  • Custom LoRA: A specialized model trained on the actress's face to preserve her specific features
  • AnimateDiff: Integrated late in the project, this technology dramatically improved temporal consistency between frames

Early test of character interaction showing the progression of the technique

Novel Filming Techniques

The production utilized several innovative approaches to filming that enhanced the final result:

  • Laser Dot Tracking: Instead of traditional tracking markers which can interfere with green screen keying, I used precisely placed laser dots that provided tracking data without affecting the key
  • Transparent Acrylic Props: For scenes requiring physical interaction, I used transparent acrylic surfaces where actors could place their hands to create a believable point of contact with the animated character
  • Dual Performance Capture: By filming actors both together and in isolation, I created reference for natural interaction while maintaining clean plates for processing
  • Real Reflections: For the eye reflection scene, I projected pre-rendered animation onto a screen and captured the actual reflection in the actor's eye, avoiding the artificial look of composited reflections
Testing green screen with stand-in actor Another green screen test on the same set

Left: Testing green screen with stand-in; Right: Additional green screen test from the same production

Compositing Innovations

The final stage of the process involved several technical innovations in compositing:

  • AI-Enhanced Chromakey: After the AI render process made the green screen "dirty," I used specialized AI methods to improve the key quality
  • HDRI Background Animation: To create living environments, I extracted elements from static HDRI backgrounds and gave them subtle movement using img2video techniques
  • Performance-Preserving Post-Processing: Building on my previous work in facial expressivity preservation, I balanced the need for visual enhancement with maintaining the emotional performance

Reflection & Learnings

This project represented a significant evolution in my exploration of AI for creative filmmaking. The rapid pace of AI development during the project period—including the release of AnimateDiff—created both challenges and opportunities that shaped the final outcome in ways I couldn't have anticipated at the outset.

What Worked Well

  • Character Consistency Approach: The combination of LoRA training and IP adapter proved extremely effective at maintaining character consistency across shots
  • Real-World Integration: The physical filming techniques (laser tracking, acrylic props, projection for eye reflections) created a more convincing blend of real and AI elements
  • Workflow Flexibility: The node-based ComfyUI approach allowed for rapid adaptation when AnimateDiff was released, significantly improving the final result

Challenges & Solutions

  • Crew Management: The project had a larger crew than technically necessary, which created coordination challenges. In the future, a smaller team might be more efficient for this type of technically specialized work
  • Evolving Technology: The release of AnimateDiff during production required quickly learning and integrating new technology. The flexibility of ComfyUI made this adaptation possible
  • Artistic Direction: While I had originally envisioned a demon-like character, the AI training resulted in more of an anime-style girl. I adapted the story to work with this aesthetic rather than fighting against the AI's tendencies

Future Applications

  • Performance-Based Character Creation: This technique could lead to a new approach to CG character creation that maintains the actor's performance without traditional mocap
  • Cost-Effective Production: Using rented GPU resources proved to be significantly more cost-effective than traditional animation approaches while maintaining quality
  • AI/Human Collaboration: The most successful elements came from finding the right balance between human creative direction and AI generation capabilities

Personal Takeaway

This project reinforced my belief that the most powerful applications of AI in creative fields come not from replacing traditional techniques, but from creating new workflows that combine the best of both worlds. By using AI to handle the labor-intensive aspects of animation while preserving the human performance at its core, we can create expressive characters with unprecedented efficiency. The positive reception from industry professionals, particularly regarding the character consistency techniques, validates this approach and suggests exciting possibilities for the future of filmmaking.