6 min read

How to Build an AI Role Play App with Character Images

A practical guide for developers building AI role play, virtual companion, or character chat apps. How to add consistent character images to conversations without building an image pipeline from scratch.

roleplayuse-casearchitecture

The Gap in AI Role Play Apps

AI role play apps are booming — Character.AI, Chai, Janitor AI, and dozens of others. Users create characters, have conversations, and build emotional connections.

But most of these apps have a visual problem: the characters either have a single static avatar, or no visual representation at all. When a user says "show me what you look like at the beach," the app can't respond with an image. When the story moves to a new scene, there's no visual to match.

The apps that do generate images hit the consistency wall — every image shows a different person. The character you fell in love with in the cafe looks like a stranger at the beach.

If you're building a role play or character chat app, adding consistent character images is a massive differentiator. Here's how to do it.

Architecture: Where Images Fit in the Conversation Loop

Your app already has a conversation loop:

User message → LLM → Response text → Display

Adding images extends this to:

User message → LLM → Response text
                  ↓
            Should this response include an image?
                  ↓ yes
            What kind? (new scene / edit existing / show reference)
                  ↓
            Image generation API → Image URL → Display with text

The LLM decides when an image adds value. Not every message needs one — but scene changes, outfit descriptions, and "show me" requests should trigger generation.

Step 1: Character Onboarding with Identity Baseline

When a user creates a character, you need a face reference. Options:

  • User uploads a photo — For "real person" characters
  • Generate a face from description — Use any image gen model for the initial face, then lock it
  • Pre-made character library — Offer a gallery of base characters

Once you have a face, generate an identity baseline — a standardized reference that anchors the character's appearance:

import requests

def create_character_baseline(face_url: str, api_key: str) -> str:
    """Generate a 4-in-1 ID photo as the identity anchor."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/id-photo",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"images": {"face": face_url}},
    )
    return resp.json()["outputs"][0]["url"]

# Store this URL in your database — it's the identity anchor
id_photo_url = create_character_baseline(face_url, api_key)

Store the ID photo URL in your character record. Every future image generation references it.

Step 2: Triggering Image Generation from Conversations

Your LLM needs to know when to generate an image. Two approaches:

Approach A: LLM Decides (Recommended)

Add instructions to your system prompt:

When the conversation involves a visual scene change, outfit description,
or the user asks to "show" something, output a JSON block with image
generation parameters:

{"generate_image": true, "prompt": "scene description", "type": "generate"}

For editing a previous image:
{"generate_image": true, "prompt": "edit description", "type": "edit"}

Your backend parses the LLM output, extracts the image generation request, calls the API, and includes the image in the response.

Approach B: Keyword Detection

Simpler but less flexible — detect keywords like "show me," "what do I look like," "change outfit," "go to the beach" and trigger generation.

Step 3: Generating Scene Images

When the LLM triggers an image, call the generation API with the character's identity anchor:

def generate_scene(character_id_photo: str, scene_prompt: str, api_key: str) -> str:
    """Generate a character scene with identity consistency."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/generate",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "prompt": scene_prompt,
            "images": {"face": character_id_photo},
        },
    )
    return resp.json()["outputs"][0]["url"]

# In your conversation handler:
image_url = generate_scene(
    character.id_photo_url,
    "sitting in a cozy cafe, wearing a casual sweater, warm smile",
    api_key,
)

The face stays the same. The scene, outfit, and expression change based on the prompt.

Step 4: Editing Previous Images

When the user says "change the background" or "try a different pose," edit the previous image instead of generating from scratch:

def edit_image(target_url: str, face_url: str, edit_prompt: str, api_key: str) -> str:
    """Edit an existing image while preserving character identity."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/edit",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "prompt": edit_prompt,
            "images": {"target": target_url, "face": face_url},
        },
    )
    return resp.json()["outputs"][0]["url"]

Editing is faster and preserves more context from the original image.

Data Model

Minimal schema for character images in a role play app:

-- Characters table
CREATE TABLE characters (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    name TEXT NOT NULL,
    face_reference_url TEXT NOT NULL,
    id_photo_url TEXT NOT NULL,
    description TEXT,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Generated images linked to conversations
CREATE TABLE character_images (
    id UUID PRIMARY KEY,
    character_id UUID REFERENCES characters(id),
    conversation_id UUID,
    image_url TEXT NOT NULL,
    prompt TEXT,
    image_type TEXT CHECK (image_type IN ('id-photo', 'generate', 'edit')),
    created_at TIMESTAMPTZ DEFAULT now()
);

Cost Management

Image generation costs per image. For a role play app, you need to be smart about when to generate:

Generate images for:

  • Scene changes ("let's go to the beach")
  • Outfit changes ("put on a red dress")
  • Explicit "show me" requests
  • Key story moments

Don't generate images for:

  • Every message in the conversation
  • Pure text exchanges
  • Repeated similar scenes (cache and reuse)

Pre-generate common scenes during off-peak hours: morning selfie, cafe, evening outfit, goodnight. Serve cached images for common scenarios, generate fresh ones for unique requests.

With AuraShot Pro ($7.9/mo for 200 images), you can serve roughly 6-7 unique images per day per active character.

The Agent Skill Shortcut

If you're building on an agent framework that supports skills (Claude, Cursor, or any OpenClaw-compatible runtime), you can skip the API integration entirely:

clawhub install aurashot-character-skill

The skill handles intent routing, parameter assembly, and local asset management. Your agent generates character images through natural conversation — no API code needed.

What Users Actually Experience

When this works well, the conversation feels alive:

User: Let's go to the beach today

Character: I'd love that! Let me change into something more appropriate...

[Image: the character in a summer outfit at the beach, same face as always]

User: The sunset is beautiful. Can you turn around and look at it?

[Image: same character, same outfit, now facing the sunset]

The character has a persistent visual identity. Every image reinforces the connection. That's the experience that keeps users coming back.

Getting Started

  1. Get a free API key — 5 images to prototype
  2. Generate an ID photo for a test character
  3. Try generating 3-4 different scenes with the same face reference
  4. Integrate into your conversation loop

The technical integration is straightforward — the hard part is deciding when images add value to the conversation. Start simple, measure engagement, and expand from there.

Ready to give your AI agent a face?

How to Build an AI Role Play App with Character Images | AuraShot