The Gap in AI Role Play Apps

AI role play apps are booming — Character.AI, Chai, Janitor AI, and dozens of others. Users create characters, have conversations, and build emotional connections.

But most of these apps have a visual problem: the characters either have a single static avatar, or no visual representation at all. When a user says "show me what you look like at the beach," the app can't respond with an image. When the story moves to a new scene, there's no visual to match.

The apps that do generate images hit the consistency wall — every image shows a different person. The character you fell in love with in the cafe looks like a stranger at the beach.

If you're building a role play or character chat app, adding consistent character images is a massive differentiator. Here's how to do it.

Architecture: Where Images Fit in the Conversation Loop

Your app already has a conversation loop:

User message → LLM → Response text → Display

Adding images extends this to:

User message → LLM → Response text
                  ↓
            Should this response include an image?
                  ↓ yes
            What kind? (new scene / edit existing / show reference)
                  ↓
            Image generation API → Image URL → Display with text

The LLM decides when an image adds value. Not every message needs one — but scene changes, outfit descriptions, and "show me" requests should trigger generation.

Step 1: Character Onboarding with Identity Baseline

When a user creates a character, you need a face reference. Options:

User uploads a photo — For "real person" characters
Generate a face from description — Use any image gen model for the initial face, then lock it
Pre-made character library — Offer a gallery of base characters

Once you have a face, generate an identity baseline — a standardized reference that anchors the character's appearance:

import requests

def create_character_baseline(face_url: str, api_key: str) -> str:
    """Generate a 4-in-1 ID photo as the identity anchor."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/id-photo",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"images": {"face": face_url}},
    )
    return resp.json()["outputs"][0]["url"]

# Store this URL in your database — it's the identity anchor
id_photo_url = create_character_baseline(face_url, api_key)

Store the ID photo URL in your character record. Every future image generation references it.

Step 2: Triggering Image Generation from Conversations

Your LLM needs to know when to generate an image. Two approaches:

Approach A: LLM Decides (Recommended)

Add instructions to your system prompt:

When the conversation involves a visual scene change, outfit description,
or the user asks to "show" something, output a JSON block with image
generation parameters:

{"generate_image": true, "prompt": "scene description", "type": "generate"}

For editing a previous image:
{"generate_image": true, "prompt": "edit description", "type": "edit"}

Your backend parses the LLM output, extracts the image generation request, calls the API, and includes the image in the response.

Approach B: Keyword Detection

Simpler but less flexible — detect keywords like "show me," "what do I look like," "change outfit," "go to the beach" and trigger generation.

Step 3: Generating Scene Images

When the LLM triggers an image, call the generation API with the character's identity anchor:

def generate_scene(character_id_photo: str, scene_prompt: str, api_key: str) -> str:
    """Generate a character scene with identity consistency."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/generate",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "prompt": scene_prompt,
            "images": {"face": character_id_photo},
        },
    )
    return resp.json()["outputs"][0]["url"]

# In your conversation handler:
image_url = generate_scene(
    character.id_photo_url,
    "sitting in a cozy cafe, wearing a casual sweater, warm smile",
    api_key,
)

The face stays the same. The scene, outfit, and expression change based on the prompt.

Step 4: Editing Previous Images

When the user says "change the background" or "try a different pose," edit the previous image instead of generating from scratch:

def edit_image(target_url: str, face_url: str, edit_prompt: str, api_key: str) -> str:
    """Edit an existing image while preserving character identity."""
    resp = requests.post(
        "https://www.aurashot.art/v1/character/edit",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "prompt": edit_prompt,
            "images": {"target": target_url, "face": face_url},
        },
    )
    return resp.json()["outputs"][0]["url"]

Editing is faster and preserves more context from the original image.

Data Model

Minimal schema for character images in a role play app:

-- Characters table
CREATE TABLE characters (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    name TEXT NOT NULL,
    face_reference_url TEXT NOT NULL,
    id_photo_url TEXT NOT NULL,
    description TEXT,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Generated images linked to conversations
CREATE TABLE character_images (
    id UUID PRIMARY KEY,
    character_id UUID REFERENCES characters(id),
    conversation_id UUID,
    image_url TEXT NOT NULL,
    prompt TEXT,
    image_type TEXT CHECK (image_type IN ('id-photo', 'generate', 'edit')),
    created_at TIMESTAMPTZ DEFAULT now()
);

Cost Management

Image generation costs per image. For a role play app, you need to be smart about when to generate:

Generate images for:

Scene changes ("let's go to the beach")
Outfit changes ("put on a red dress")
Explicit "show me" requests
Key story moments

Don't generate images for:

Every message in the conversation
Pure text exchanges
Repeated similar scenes (cache and reuse)

Pre-generate common scenes during off-peak hours: morning selfie, cafe, evening outfit, goodnight. Serve cached images for common scenarios, generate fresh ones for unique requests.

With AuraShot Pro ($7.9/mo for 200 images), you can serve roughly 6-7 unique images per day per active character.

The Agent Skill Shortcut

If you're building on an agent framework that supports skills (Claude, Cursor, or any OpenClaw-compatible runtime), you can skip the API integration entirely:

clawhub install aurashot-character-skill

The skill handles intent routing, parameter assembly, and local asset management. Your agent generates character images through natural conversation — no API code needed.

What Users Actually Experience

When this works well, the conversation feels alive:

User: Let's go to the beach today

Character: I'd love that! Let me change into something more appropriate...

[Image: the character in a summer outfit at the beach, same face as always]

User: The sunset is beautiful. Can you turn around and look at it?

[Image: same character, same outfit, now facing the sunset]

The character has a persistent visual identity. Every image reinforces the connection. That's the experience that keeps users coming back.

Getting Started

Get a free API key — 5 images to prototype
Generate an ID photo for a test character
Try generating 3-4 different scenes with the same face reference
Integrate into your conversation loop

The technical integration is straightforward — the hard part is deciding when images add value to the conversation. Start simple, measure engagement, and expand from there.

How to Build an AI Role Play App with Character Images