Custom AI Training and Fine-Tuning - Federico's Portfolio

Project Overview

Building upon my background in neural networks from previous USC coursework, I undertook the challenge of training custom LoRA (Low-Rank Adaptation) models as a crucial component of my innovative ExMachina project. My goal was to accurately capture the likenesses of Alicia Vikander's Ava and Oscar Isaac's Nathan, along with the distinctive cinematographic style of the film "Ex Machina." This training process represented a significant technical undertaking in early 2023, when AI model training resources were limited and best practices were still emerging.

My journey began with general AI experimentation and textual embeddings before I discovered that LoRAs offered the perfect balance of training efficiency and visual quality for character-specific needs. Through methodical development and numerous iterations, I created specialized models that could transform my Unity-based 3D renders into photorealistic scenes that precisely matched the original film's characters and visual aesthetic. This model development became the cornerstone of my comprehensive AI-powered rendering pipeline for the project.

🎯

Goal

Create custom AI models that could accurately render Ex Machina characters and style for my film recreation project

⏱️

Timeline

February - April 2023, during the early days of LoRA training technology

🧠

Role

Dataset Creator, ML Engineer, and Visual Director responsible for all aspects of the training process

🛠️

Tools & Technologies

Stable Diffusion 1.5, Auto1111, Kohya LoRA Training, Google Colab, Dataset Creation & Captioning Tools

Challenge & Solution

The Challenge

Creating photorealistic, character-accurate renders for my ExMachina project required models specifically trained on the film's characters and visual style, but several significant challenges stood in the way:

Limited Documentation: In early 2023, AI training documentation was sparse and often too technical for non-ML specialists
Computing Resources: Training even modest models required significant GPU resources that weren't readily available
Character-Specific Challenges: Ava's unique appearance (metal skull, transparent abdomen) presented particular difficulties
Dataset Complexity: Building effective training datasets required careful selection and captioning
Training Balance: Finding the optimal training parameters to avoid overtraining while ensuring accurate likenesses

The Solution

Through persistent experimentation and learning from the emerging AI community, I developed a comprehensive solution:

Strategic Dataset Creation: Captured 20 images per character with diverse lighting, backgrounds, and angles
Negative Captioning Strategy: Developed a technique of captioning what doesn't belong to the concept rather than what does
Training Evolution: Progressed from basic embeddings to specialized LoRA models through over 45 training variations
Concept Consolidation: Created a unified LoRA that balanced character likenesses, environment details, and cinematic style
Emergent Capability Identification: Discovered and enhanced the AI's emergent ability to simulate physics like Ava's glowing abdomen

Early AI generated image of Nathan from Ex Machina using a custom LoRA

One of my first Nathan LoRA test outputs

Example training image showing Ava with hair, causing issues

Problematic training image: Ava with hair created dataset conflicts

Process & Methodology

My approach to LoRA training evolved significantly throughout the project as I experimented with different techniques and learned from both successes and failures.

1

Initial Exploration & Learning

I began by diving deep into the emerging world of AI image generation and model training. As I noted in my journal on January 3, 2023: "spent most of the day learning to use/install stable diffusion" and January 5: "dedicated two straight days to learning stable diffusion and I'm still not done."

This learning phase included:

Installing and configuring Auto1111 Stable Diffusion WebUI
Learning core machine learning concepts like epochs, stochastic gradient descent, and backpropagation
Experimenting with basic image generation techniques
Researching the still-emerging field of custom model training

Early AI image generation test using InvokeAI

One of my first ever AI image generations using InvokeAI

2

Dataset Development & Embedding Attempts

My initial attempts at training focused on textual embeddings, which were easier to create but produced limited results. As I documented in my journal on March 4, 2023: "continued trying to train my embedding. After removing images and fiddling values I finally got it to work... In conclusion: I'm not sure why she turns dark-skinned sometimes. I think it's the AI confusing the skin tone when she was under shadows in the training set..."

The dataset creation process involved:

Manually capturing and selecting screenshots of my character
Creating diverse samples with varied lighting, angles, and contexts
Processing images for optimal training (removing backgrounds, enhancing quality)
Developing a systematic approach to image captioning

AI image generated using early textual embedding training attempt

Early embedding test result with my original character (used as practice)

3

LoRA Training & Optimization

After concluding that embeddings weren't suitable for my needs, I pivoted to an entirely different approach. I discovered LoRA training through a detailed Reddit guide by user "plum" that marked a complete change in my methodology. This new phase required different tools, techniques, and understanding compared to my previous embedding experiments.

My journal on March 12, 2023, captures this transition: "managed to install a tool on my PC to train a LORA. It took way too long to install the fukin thing. Very complicated. Then I gathered the dataset. Now I have to crop it and label it."

And on March 14: "The overnight training worked and I had a Nathan Lora. The results were quite good, but a bit distorted... I later found out I forgot to download the lora extension. After downloading it results were actually very good."

Key developments during this phase:

Following detailed LoRA training guides and tutorials
Experimenting with Kohya's training suite, which proved powerful but complex
Testing outputs and identifying training issues (like Nathan's profile views)
Eventually finding a streamlined Google Colab notebook that simplified the process

Early AI generated full figure of Ava using LoRA

Early AI generated portrait of Ava using LoRA

Early Ava LoRA outputs - full figure (left) and portrait (right) showing consistent character likeness

4

Dataset Refinement & Advanced Captioning

Through my extensive experimentation, I discovered that the captioning strategy was as important as the dataset quality itself. I developed a novel approach to captioning that focused on what didn't belong to the concept rather than what did.

This phase involved:

Creating specialized datasets for characters, environments, and film style
Developing advanced captioning strategies (e.g., "Ava with hair behind a glass wall with a reflection in a red room full of red light, with a hand gesture, in the style of ExMachina")
Training separate LoRAs for different concepts to enable precise control
Creating a unified LoRA that incorporated characters, environments, and style

Sample training image of Ava in red light with caption

Caption: "Ava with hair behind a glass wall with a reflection in a red room full of red light, with a hand gesture, in the style of ExMachina"

Sample training image of Ex Machina hallway style with caption

Caption: "a MachinaHall hallway with a black door with a light on it, in the style of ExMachina"

Sample training image closeup of Ava with caption

Caption: "a closeup of Ava looking near the camera, with a white background, in the style of ExMachina"

Example dataset images with their specific captions for LoRA training

5

Integration & Video Workflow Testing

The final phase involved integrating the trained LoRAs into my video rendering workflow. This required optimization and testing to ensure they worked effectively on video sequences.

As noted in my journal on March 29, 2023: "I spent the time preparing Nathan's part of my ExMachina thing" and on March 31: "I almost gave up on the experiment today. I left it processing and realized it wasn't working well. I then realized that the corridor method of leaving most of the work to a dreambooth model and doing minimal prompting might be best. I was feeling awful, I thought I had wasted all this time. But then I looked at the processed result an it looked incredible. This reignited me."

Key achievements in this phase:

Successfully testing LoRAs in video context, identifying and addressing issues
Discovering emergent capabilities, like the AI's ability to render Ava's transparent, glowing abdomen correctly
Creating prompt templates that effectively leveraged the trained LoRAs
Fine-tuning the rendering process for optimal visual quality and character fidelity

Testing Nathan LoRA on video sequences - revealing issues with profile views that needed addressing

Results & Impact

After months of experimentation and dozens of training iterations, I successfully created LoRA models that could accurately capture the likenesses of Ava and Nathan while preserving the distinctive visual style of Ex Machina. These models became the foundation of my AI-powered rendering workflow.

45+

LoRA Variations Created

Trained dozens of variations to find the optimal balance of character accuracy and stylistic fidelity

20+

Images Per Character

Carefully curated training datasets with diverse angles, lighting, and contexts

100%

Character Recognition

Achieved perfect recognizability of both main characters in the final LoRA models

Key Achievements

Beyond the technical metrics, this project resulted in several groundbreaking outcomes:

Character Accuracy: Successfully trained AI models that could accurately render the likeness of both actors, responding effectively to prompting keywords
Emergent Physics Capabilities: Discovered the AI's ability to render complex physical attributes like Ava's transparent, glowing abdomen without specific physics training
Style Replication: Captured the film's unique cinematographic style, enhancing the overall visual fidelity of the ExMachina project
Multi-Concept LoRAs: Pioneered techniques for training multiple concepts into a single LoRA for practicality and balanced weighting

"I think in the end I made somewhere around 45 variations of the lora... I decided to make a general lora for bodies style and environment. I decided on training all those concepts into a single lora for practicality. including several keywords into a single lora was a challenge but it was necessary so that all concepts had equal weight."
— From my project notes, March 2023

Technical Deep Dive

Dataset Creation Strategy

My approach to dataset creation evolved significantly through experimentation:

Character Diversity: Captured 20 images per character including a variety of lighting conditions, backgrounds, and body angles
Style Isolation: Created a separate dataset exclusively for the Ex Machina cinematographic style
Negative Captioning: Developed a technique of captioning elements that weren't part of the concept rather than what was
Preprocessing: Used Topaz and other tools to enhance image quality and isolate specific elements

Caption: "Ava in front of a grey background looking at a white shirt in the lower left of the frame"

Caption: "MachinaNathan standing in a MachinaHall hallway with glasses making a hand gesture with a wall in the background"

Training Evolution

My LoRA training process went through several stages of development:

Initial Embeddings: Started with textual embeddings, but found the process too time-intensive for subpar results
Kohya Experiments: Moved to Kohya's suite, which offered more options but was overwhelmingly complex
Reddit Guide: Followed a comprehensive guide by Reddit user "plum" that provided a more structured approach
Google Colab: Eventually found a streamlined Google Colab notebook that made the process more accessible
Prompt Engineering: Developed specialized prompting templates to maximize the effectiveness of the trained LoRAs

Technical Challenges

Several technical hurdles required creative solutions:

Ava's Baldness: As noted in my documentation: "It was a challenge to keep her bald because of her steel skull." Training images with hair created consistent problems.
Nathan's Profile View: The LoRA struggled with profile views: "Upon experimenting with this lora on video I realized I may not have trained enough on the back of his head, or maybe I had to carefully prompt sections where he turned around or moved a lot."
Computing Limitations: Training required significant GPU resources that weren't always available, requiring optimization of training parameters.
Concept Balance: Including multiple concepts (characters, style, environment) in a single LoRA required careful weighting and training strategies.

Early AI generated image of Ava showing improved facial detail

Another early Ava test showing improving quality in facial details

Reflection & Learnings

This project expanded my technical understanding of AI training and opened new creative possibilities, while also revealing important lessons about the field.

What Worked Well

Negative Captioning: Focusing on what a concept wasn't rather than what it was produced better training results
Manual Dataset Curation: Carefully selected training images with diverse angles and lighting significantly improved model quality
Iterative Testing: Creating multiple LoRA versions allowed for progressive improvement based on observed outputs

Challenges & Solutions

Character Inconsistencies: Adding more training examples of problematic angles and scenarios improved handling of difficult poses
Competing Concepts: Balancing different elements within a single LoRA required careful dataset management
Computing Resources: Finding optimized training routines and leveraging Google Colab helped overcome hardware limitations

Future Applications

Emergent Physics Understanding: The AI's ability to render Ava's transparent abdomen suggests potential for physics-free simulation
Style Preservation: Methods developed for capturing cinematic style could apply to other visual media projects
Character Likeness Fidelity: Techniques for actor likeness could benefit future performance-based productions

Personal Takeaway

This project taught me that in AI training, quality far outweighs quantity. A small, carefully curated dataset with strategic captioning produced dramatically better results than larger but less refined collections. The discovery of emergent capabilities - like the AI understanding the physical properties of Ava's transparent components without explicit training - revealed the potential for AI to internalize complex visual information from limited examples. This experience fundamentally changed my approach to AI, emphasizing careful preparation and strategic training over brute-force methods.

Custom AI Training and Fine-Tuning for ExMachina Project