How to Build an AI Role Play App with Character Images
A practical guide for developers building AI role play, virtual companion, or character chat apps. How to add consistent character images to conversations without building an image pipeline from scratch.
The Gap in AI Role Play Apps
AI role play apps are booming — Character.AI, Chai, Janitor AI, and dozens of others. Users create characters, have conversations, and build emotional connections.
But most of these apps have a visual problem: the characters either have a single static avatar, or no visual representation at all. When a user says "show me what you look like at the beach," the app can't respond with an image. When the story moves to a new scene, there's no visual to match.
The apps that do generate images hit the consistency wall — every image shows a different person. The character you fell in love with in the cafe looks like a stranger at the beach.
If you're building a role play or character chat app, adding consistent character images is a massive differentiator. Here's how to do it.
Architecture: Where Images Fit in the Conversation Loop
Your app already has a conversation loop:
User message → LLM → Response text → Display
Adding images extends this to:
User message → LLM → Response text
↓
Should this response include an image?
↓ yes
What kind? (new scene / edit existing / show reference)
↓
Image generation API → Image URL → Display with text
The LLM decides when an image adds value. Not every message needs one — but scene changes, outfit descriptions, and "show me" requests should trigger generation.
Step 1: Character Onboarding with Identity Baseline
When a user creates a character, you need a face reference. Options:
- User uploads a photo — For "real person" characters
- Generate a face from description — Use any image gen model for the initial face, then lock it
- Pre-made character library — Offer a gallery of base characters
Once you have a face, generate an identity baseline — a standardized reference that anchors the character's appearance:
import requests
def create_character_baseline(face_url: str, api_key: str) -> str:
"""Generate a 4-in-1 ID photo as the identity anchor."""
resp = requests.post(
"https://www.aurashot.art/v1/character/id-photo",
headers={"Authorization": f"Bearer {api_key}"},
json={"images": {"face": face_url}},
)
return resp.json()["outputs"][0]["url"]
# Store this URL in your database — it's the identity anchor
id_photo_url = create_character_baseline(face_url, api_key)
Store the ID photo URL in your character record. Every future image generation references it.
Step 2: Triggering Image Generation from Conversations
Your LLM needs to know when to generate an image. Two approaches:
Approach A: LLM Decides (Recommended)
Add instructions to your system prompt:
When the conversation involves a visual scene change, outfit description,
or the user asks to "show" something, output a JSON block with image
generation parameters:
{"generate_image": true, "prompt": "scene description", "type": "generate"}
For editing a previous image:
{"generate_image": true, "prompt": "edit description", "type": "edit"}
Your backend parses the LLM output, extracts the image generation request, calls the API, and includes the image in the response.
Approach B: Keyword Detection
Simpler but less flexible — detect keywords like "show me," "what do I look like," "change outfit," "go to the beach" and trigger generation.
Step 3: Generating Scene Images
When the LLM triggers an image, call the generation API with the character's identity anchor:
def generate_scene(character_id_photo: str, scene_prompt: str, api_key: str) -> str:
"""Generate a character scene with identity consistency."""
resp = requests.post(
"https://www.aurashot.art/v1/character/generate",
headers={"Authorization": f"Bearer {api_key}"},
json={
"prompt": scene_prompt,
"images": {"face": character_id_photo},
},
)
return resp.json()["outputs"][0]["url"]
# In your conversation handler:
image_url = generate_scene(
character.id_photo_url,
"sitting in a cozy cafe, wearing a casual sweater, warm smile",
api_key,
)
The face stays the same. The scene, outfit, and expression change based on the prompt.
Step 4: Editing Previous Images
When the user says "change the background" or "try a different pose," edit the previous image instead of generating from scratch:
def edit_image(target_url: str, face_url: str, edit_prompt: str, api_key: str) -> str:
"""Edit an existing image while preserving character identity."""
resp = requests.post(
"https://www.aurashot.art/v1/character/edit",
headers={"Authorization": f"Bearer {api_key}"},
json={
"prompt": edit_prompt,
"images": {"target": target_url, "face": face_url},
},
)
return resp.json()["outputs"][0]["url"]
Editing is faster and preserves more context from the original image.
Data Model
Minimal schema for character images in a role play app:
-- Characters table
CREATE TABLE characters (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
name TEXT NOT NULL,
face_reference_url TEXT NOT NULL,
id_photo_url TEXT NOT NULL,
description TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Generated images linked to conversations
CREATE TABLE character_images (
id UUID PRIMARY KEY,
character_id UUID REFERENCES characters(id),
conversation_id UUID,
image_url TEXT NOT NULL,
prompt TEXT,
image_type TEXT CHECK (image_type IN ('id-photo', 'generate', 'edit')),
created_at TIMESTAMPTZ DEFAULT now()
);
Cost Management
Image generation costs per image. For a role play app, you need to be smart about when to generate:
Generate images for:
- Scene changes ("let's go to the beach")
- Outfit changes ("put on a red dress")
- Explicit "show me" requests
- Key story moments
Don't generate images for:
- Every message in the conversation
- Pure text exchanges
- Repeated similar scenes (cache and reuse)
Pre-generate common scenes during off-peak hours: morning selfie, cafe, evening outfit, goodnight. Serve cached images for common scenarios, generate fresh ones for unique requests.
With AuraShot Pro ($7.9/mo for 200 images), you can serve roughly 6-7 unique images per day per active character.
The Agent Skill Shortcut
If you're building on an agent framework that supports skills (Claude, Cursor, or any OpenClaw-compatible runtime), you can skip the API integration entirely:
clawhub install aurashot-character-skill
The skill handles intent routing, parameter assembly, and local asset management. Your agent generates character images through natural conversation — no API code needed.
What Users Actually Experience
When this works well, the conversation feels alive:
User: Let's go to the beach today
Character: I'd love that! Let me change into something more appropriate...
[Image: the character in a summer outfit at the beach, same face as always]
User: The sunset is beautiful. Can you turn around and look at it?
[Image: same character, same outfit, now facing the sunset]
The character has a persistent visual identity. Every image reinforces the connection. That's the experience that keeps users coming back.
Getting Started
- Get a free API key — 5 images to prototype
- Generate an ID photo for a test character
- Try generating 3-4 different scenes with the same face reference
- Integrate into your conversation loop
The technical integration is straightforward — the hard part is deciding when images add value to the conversation. Start simple, measure engagement, and expand from there.