The Complete Guide

AI UGC:
The Ultimate
Playbook

Master AI-generated User Generated Content — from realistic models to talking videos, product ads, and beyond.

Chapter 1 AI Image Basics

What Is AI Image Generation?

AI image generation is the process of turning a text description into a visual. You write what you want to see, and the model produces an image based on your input. It sounds simple, and at its core it is, but the part most beginners get wrong is expecting the result to be perfect on the first try. That is not how this works. Your role is not to write one magical sentence and get a flawless output. Your role is to guide the model, step by step, until the result matches what you had in mind. Think of it less like typing a search query and more like directing a creative partner who is incredibly fast but needs clear instructions.

How It Actually Works

When you type a prompt, the model does not "draw" the way a human would. What actually happens is a multi step process. First, your prompt gets broken down into concepts like subject, style, lighting, mood, and composition. Then the model starts from random visual noise and begins predicting what the image should look like, pixel by pixel, refining the picture over many tiny steps until a final image appears.

There are two things worth keeping in mind from the start. The model does not "understand" your words the way a person would. It predicts visual outcomes based on patterns it learned from massive amounts of training data. And because of that, results are probabilistic. The same prompt can give you different outputs every time. That is completely normal and expected.

Your prompt is parsed into concepts (subject, style, lighting, mood, composition)

The model begins from random visual noise

It predicts pixels step by step

The picture becomes clearer over many tiny refinement steps

A final image appears

Why Your Results Keep Changing

One of the most common frustrations beginners have is that the same prompt can produce completely different images each time. If you are used to tools like Google, where the same input always gives the same output, this feels broken. But AI image generation does not work like a search engine. It works more like a creative assistant. You give it a direction, it interprets that direction in its own way, you look at the result, adjust your instructions, and try again. The randomness you see is not a flaw in the system. It is a fundamental part of how these models operate. Once you accept that, you stop chasing the perfect first result and start focusing on what actually matters: learning to iterate quickly and making small, deliberate adjustments between generations.

The Core Workflow

No matter which tool you end up using, the underlying workflow is always the same. You write a prompt, generate the image, evaluate what came out, adjust your prompt based on what you liked or did not like, and generate again. That loop is the entire process. If you internalize this cycle early on, switching between tools later becomes easy because the thinking behind it never changes. The tool is just the surface. The workflow is the skill.

1Prompt

2Generate

3Evaluate

4Adjust

5Generate again

Chapter 1 AI Image Basics

Tools: 3 Categories That Explain Everything

There are a lot of AI tools out there and the number keeps growing. But if you zoom out, almost all of them fall into three categories. Understanding these three buckets will save you from confusion and help you know exactly what each tool is for.

1) Image Models

Image models generate still images from a text prompt or from an image you provide as a reference. Examples include Nano Banana (Nano Banana 2 and Nano Banana Pro), GPT Image (sometimes listed as GPT img), Midjourney, and Seedream (4.0 and newer). Each model has its own visual style and strengths, so results will look different depending on which one you use. But the core workflow stays the same no matter which model you pick.

2) Video Models

Video models generate short video clips from text, images, or a combination of both. Examples include Kling, Veo, Sora, Wan, and Seedance. Video models vary a lot in how they handle motion, consistency, and visual quality. The best way to learn any video model is to start with short, simple tests so you get a feel for how it moves before trying anything complex.

3) Platforms

Platforms are workspaces where you can access multiple models in one place. They often come with extra features like galleries, workflow tools, and side by side comparisons. Examples include Higgsfield, ImagineArt, Weavy (also known as Figma Weave after acquisition), and FLORA. The biggest advantage of platforms is convenience. You can test and compare different models without jumping between apps, which makes iteration much faster.

Beginner Rule (So You Don’t Tool Hop)

There is no legitimate tool that gives you unlimited generations for free. If you want to experiment seriously, a paid plan is almost always necessary. So keep it simple and pick one platform first. It is cheaper, more convenient, and it helps you learn faster because everything stays in one place. In this book we use Higgsfield, and it is the platform I recommend. If you want to follow along with every workflow and run your own experiments, I suggest getting the Pro subscription.

What Makes a Good Beginner Prompt

A beginner prompt should be:

Simple
Specific
Easy to edit

Use this structure:

SubjectWho or what

SceneWhere & doing what

LightingLight source & mood

StyleVisual feel

Example:

Prompt Example

A girl sitting in a cute cafe outside, natural daylight, modern photography style, realistic details

That is enough to start.

Try making one now!

Open Nano Banana in Higgsfield

Go to higgsfield.ai and select Image → Nano Banana Pro (or Nano Banana 2).

Paste this prompt

Prompt Example

A girl sitting in a cute cafe outside, natural daylight, modern photography style, realistic details

Set your first settings

Aspect ratio: 1:1
Quality: 2K
Outputs: 4 images
Click Generate

You’ll get four different images even though the prompt is identical.

That variation is normal. It comes from the model’s randomness and the prompt still being a little broad, which gives the AI room to interpret details.

And that’s a win.

Congrats, you just generated your first AI images.

The One Skill That Makes You Good Fast

The single most important skill in AI image generation is iteration. After every generation, ask yourself three questions. What do I like about this result? What is wrong with it? And what is the smallest change I can try next? Small edits between generations will always beat big rewrites because they help you understand exactly what changed and why. This is how you build real control over your outputs.

Common Beginner Mistakes

Most beginners make the same mistakes early on. They try to force perfection on the very first generation. They copy complex prompts from the internet without understanding what each part does. They blame the tool and switch to a new one when the real issue is the prompt. And they change ten things at once, then have no idea which change actually made a difference. If you can avoid these habits from the start, you will improve much faster than most people.

Chapter 2 AI Video Basics

What Is AI Video Generation?

AI video generation follows the same core idea as image generation, but with one extra challenge: time. Instead of producing a single still frame, the model has to generate many frames in sequence that all feel like they belong to the same scene. Every frame needs to stay consistent with the one before it, which makes video significantly harder for AI to get right. That is also why video results tend to be less predictable than images, and why learning to work with video takes a bit more patience.

How It Works

Most video models follow this logic:

Read your input (text, image, or both)

Decide what the scene should look like

Predict what should happen next over time

Generate frames in sequence

Stitch those frames into a short video

Two Things to Remember

1. AI does not “record” motion. It predicts motion.

2. Video is even more probabilistic than images. Small changes can cause big differences.

That is why iteration is not optional. It is the workflow.

Three Video Modes You’ll See Everywhere

Most AI video tools work in one of three modes. Understanding the difference early will help you pick the right approach for each project.

Image to Video is where you start from a still image and the model animates it. This is the best option when you want realism, consistent identity, and controlled results.

Text to Video generates the entire scene from scratch using only a text prompt. This is great for testing concepts and fast ideation, but the tradeoff is that results are less predictable and faces or fine details can drift between frames.

Talking or Audio Driven Video is where you provide speech or audio and the character lip syncs and emotes to match. This mode is ideal for UGC style clips, dialogue scenes, and talking head content.

The Quality Tradeoff (Know This Early)

In video generation, you are almost always trading one thing for another. Understanding this tradeoff early will save you a lot of frustration.

	More Realism	Cheaper and Faster
Cost	Higher	Lower
Generation time	Longer	Shorter
Detail level	Rich and accurate	Less refined
Face accuracy	Strong	Can drift or distort
Visual artifacts	Fewer	More common
Reruns needed	More	Fewer

This is not a limitation of any specific tool. It is how video generation works right now across the board. Once you accept that, you stop fighting the system and start making smarter decisions about when to prioritize quality and when speed is good enough.

The Video Prompt Formula

The biggest difference between an image prompt and a video prompt is that video needs to describe change over time, not just a static scene. A good video prompt tells the model what is happening, not just what things look like.

A simple structure that works across most tools:

SceneWhere it happens

SubjectWho or what

MotionWhat they do

CameraAngle & movement

MoodFeel & tone

Keep the motion simple at first. You can always add complexity later once you are getting clean results. For example:

Prompt Example

A person walking forward calmly, natural movement, soft daylight, steady camera, realistic motion

That is enough to get started. Notice how the prompt does not ask for anything dramatic. No spinning camera, no complex choreography. Simple motion is the fastest way to get believable output, especially when you are still learning how a model handles movement.

Try Making One Now

We’ll do a simple Image to Video test using the image you generated in the previous chapter.

Open Higgsfield

Go to higgsfield.ai and select Video then choose a model (this time Kling 2.6).

Kling 2.6 is not the best model but it is good for quick testing.

Upload your image

Use one of your cafe images from the AI Image Basics chapter.

You can simply download the image, drag and drop into the first frame.

And insert this prompt in the prompt box. ”In a cafe, a person is sitting naturally. Static camera, natural mood..”

Set the time 5s, quality 1080p and hit generate.

If your output looks like this, that’s a success!

How to Evaluate Your Result

After every generation, run through this quick checklist before deciding whether to keep or redo the output.

Quick Checklist

Does the motion feel natural?
Does the face stay consistent throughout the clip?
Does the camera movement feel believable?

If something looks off, do not start rewriting your entire prompt. Try these fixes first:

Reduce the amount of motion
Remove any fancy camera moves
Regenerate a few more variations with the same prompt

Most early problems come from asking the model to do too much at once. Simple motion is the fastest way to get clean, believable results.

Chapter 3 What is AI UGC

What “UGC” Means

UGC stands for User Generated Content. In marketing, UGC is content that feels like it was made by a normal person rather than a brand or production team. It is the kind of content you scroll past on TikTok or Instagram and think someone just filmed it on their phone because they genuinely wanted to share something. That casual, unpolished, honest feeling is exactly what makes it work.

UGC Feels Like	UGC Does Not Feel Like
Casual and spontaneous	Scripted and rehearsed
Slightly imperfect	Perfectly polished
Shot on a phone	Shot in a studio
Honest and relatable	Salesy and branded
Everyday life moments	High production commercials
A friend recommending something	A company selling something

UGC vs Influencer Content

UGC and influencer content might look similar at first glance, but they create very different feelings in the viewer. Influencer content tends to feel polished and intentional, while UGC feels closer to a real customer sharing a genuine experience. That difference matters because people trust content that feels authentic, which is why UGC often outperforms influencer content in ads.

	UGC	Influencer Content
Tone	Casual, unscripted	Polished, intentional
Who it feels like	A real customer	A paid promotion
Production quality	Low, phone style	High, curated
Trust level	Higher	Lower
Viewer reaction	"This feels real"	"This is an ad"
Best for	Conversions and trust	Awareness and reach
Best for	Conversions and trust	Awareness and reach

So What Is AI UGC?

AI UGC is content that has the look and feel of traditional user generated content, but is created or assisted by AI. The goal is not to make content that looks like AI made it. The goal is the exact opposite. You want content where the person feels real, the moment feels real, and a viewer scrolling past would genuinely believe this could be a real video filmed by a real person on their phone. If someone watches it and thinks "this is AI," you have not reached the standard yet. The entire point of AI UGC is that it is indistinguishable from the real thing.

The Core Risk

If your AI UGC looks AI generated, it can actively hurt you. UGC works because of trust. The moment a viewer feels like something is fake, the effect flips. Instead of building trust, it destroys it. Instead of driving interest, it creates suspicion. Instead of converting, it pushes people away. That is why realism is not a nice bonus or something you optimize for later. Realism is the entire game. Every technique in this book exists to get you closer to that standard.

Risks (If It Looks AI)	Opportunities (If It Looks Real)
Viewers lose trust immediately	Builds instant credibility with the audience
Content feels deceptive and manipulative	Feels authentic and relatable
Lower engagement and conversions	Higher engagement and click through rates
Brand reputation damage	Scalable content without hiring creators
Wasted time and ad spend	Faster production at a fraction of the cost
Comments calling out "this is AI"	Content that blends seamlessly into feeds
Comments calling out "this is AI"	Content that blends seamlessly into feeds

What “Ultra Realism” Means in AI UGC

Ultra realism means the output has photographic credibility. When someone looks at it, their first instinct should be that a real camera captured this moment. It is not about making something that looks "pretty" or "high quality" in an artistic sense. It is about making something that looks like it actually happened in the real world.

Ultra realistic results usually include:

Natural skin texture with visible pores and subtle imperfections
Believable lighting that matches a real environment
Correct facial proportions and slight asymmetry
Real world details like tiny shadows, skin blemishes, and uneven tones

A simple way to check your output is to ask what it should not look like. If the skin feels overly smooth, it is not there yet. If the image looks stylized or artistic, it is not there yet. If the person looks plastic or too perfect, it is not there yet. Real people are not flawless, and your AI output should not be either.

The Simple Test

The easiest way to judge your AI UGC is to imagine someone scrolling through their feed and seeing your content for a split second. If their gut reaction is "this could be a real photo or video," you have hit the standard. If anything makes them pause and think "this looks AI generated," you are not there yet. That quick gut reaction is the benchmark for everything you create from this point forward.

Why This Matters for Everything Later

Every workflow you will learn in this book builds on top of realism. Product holding shots only work if the person looks real. Talking head videos only convert if the viewer believes someone is actually speaking. Ads only perform if the audience trusts what they are seeing. Consistent characters only matter if they look like real people in the first place. None of the advanced techniques later in this book will work if the foundation is not there. Get realism right first. Everything else follows from that.

Chapter 4 Tools to Use

Too Many Tools Is the Fastest Way to Get Stuck

There are more AI tools available now than anyone could reasonably try, and new ones launch every week. But for beginners, more tools almost always means more confusion, slower progress, and less actual skill building. You end up spending your time comparing tools instead of learning how to use any of them well. That is why this book limits the stack on purpose. We use a small, focused set of tools that cover everything you need. You can always explore more later once you have the fundamentals down. But first, build your skills with a simple setup that lets you focus on what actually matters.

The Core Stack We Use in This Book

Platform: Higgsfield

This is the main workspace where you can access models in one place.

Why we use it:

Less switching between sites
Easier to test and compare
Faster iteration

Image Generation model: Nano Banana

This is our default image model that you can use within Higgsfield.

Why we use it:

Great for realistic images
Strong results with editing capability
Good base for later video workflows

Video Generation: Veo and Kling

We use two video models because video has different strengths depending on the goal.

Why we use them:

High quality motion and realism with audio
Useful for both subtle animation and cinematic movement
Widely used by serious creators

Prompting and Planning: ChatGPT

ChatGPT is not the image model. You could also use other LLM like Gemini but ChatGPT thinking mode is my favorite so far.

It is your prompt builder and thinking partner.

Why we use it:

Helps you write clear prompts fast
Helps you simplify when results fail
Helps you iterate logically instead of guessing

Beginner Rule

Start with one platform, one image model, one or two video models, and ChatGPT for prompt writing. That is your entire stack. Resist the urge to add more until you can consistently get good results with this setup. Once you reach that point, you will have enough understanding to evaluate new tools properly and actually know whether they improve your workflow. Until then, keeping it simple is the fastest path forward.

Chapter 5 Aspect Ratio & Settings

What Aspect Ratio Means

Aspect ratio is the shape of your image or video.

It controls how wide or tall the frame is.

Same prompt, same model, different aspect ratio can change:

composition
cropping
how “UGC like” it feels

The 3 Most Useful Ratios

1:1

Square

Instagram feed posts
Carousel thumbnails
Simple product shots

9:16

Vertical

TikTok
Instagram Reels
YouTube Shorts
Most UGC ads

16:9

Horizontal

YouTube videos
Website heroes
Cinematic scenes
Landscape shots

If you are making AI UGC, 9:16 is your default most of the time.

Why It Matters for AI UGC

UGC is designed to feel like it was filmed on a phone, and phones are vertical. That means the aspect ratio you choose has a direct impact on how authentic your content feels. If you generate UGC style content in 16:9, it can instantly feel like a production shoot instead of a casual phone clip, even if everything else looks perfect. As a rule of thumb, UGC style content should almost always be 9:16 unless you have a strong reason to do otherwise.

Two More Settings You’ll See Everywhere

Quality or Resolution

Higher quality means more detail, but also longer generation time and sometimes higher cost. When you are still experimenting with a prompt, start at mid quality first. Get the prompt working, then increase the resolution for your final output. This saves both time and credits.

Variations

Most tools let you generate multiple outputs at once from the same prompt. Use this. Because generation is probabilistic, there is no single correct result. One variation might look great while another from the exact same prompt looks off. Generate at least four variations whenever you can. It speeds up learning and improves your odds fast.

Chapter 6 Writing Good Prompts

How AI Understands Visual Language

AI does not simulate a real camera. Instead, it links words to visual patterns it has seen in training data. So when you write things like "cinematic lighting," "35mm look," or "shallow depth of field," the model is not doing physics. It is recalling visual examples that match those phrases. That is why approximate terms work, visual language beats technical precision, and consistency matters more than being "correct."

Director Mindset Beats Camera Mindset

A camera operator thinks in numbers. A director thinks in feeling and visuals. AI responds better to director thinking. Instead of asking "what aperture should I use," think about the feeling you want. Should the background feel soft or sharp? Should this feel intimate or epic? Should this feel like a casual phone clip or a movie scene? This mental shift instantly improves your prompts because you start describing what you want to see instead of trying to configure a virtual camera.

Describing the Scene (Visual Vocabulary)

You can use all of these terms without any real camera knowledge. Think of them as building blocks you mix and match depending on the shot you want.

Shot and Framing

Term	What It Looks Like	Best For in UGC
Extreme close up	Eyes, lips, skin texture	Skincare, makeup, emotional moments
Close up	Face focused	Talking head videos, selfie style
Medium shot	Upper body	Product holding, casual conversation
Medium wide	Body plus environment	Lifestyle, outfit reveals
Wide shot	Full scene	Room tours, full body fashion
Centered framing	Subject in the middle, clean	Direct to camera UGC
Off center framing	Subject slightly to one side	Natural, candid feel

Lighting

Term	What It Does	UGC Feel
Natural daylight	Realistic outdoor or window light	Everyday, authentic
Soft lighting	Flattering, even, minimal shadows	Clean and approachable
Hard lighting	Strong shadows, dramatic contrast	Bold, editorial
Backlighting	Light source behind the subject	Dreamy, golden hour
Rim light	Glowing edge around the subject	Cinematic but natural
High key	Bright, almost no shadows	Commercial, clean
Low key	Dark, moody, deep shadows	Nighttime, intimate
Diffused light	Soft shadows, even spread	Realistic indoor UGC

Camera Feel and Motion

For UGC, camera feel is one of the most important details. It is what separates "this looks like a real phone video" from "this looks like a production."

Term	What It Does	When to Use
Static shot	No movement at all	Simple product shots
Handheld camera	Natural shake, vlog style	Most UGC talking videos
Locked off shot	Very stable, no drift	Clean product demos
Slow push in	Subtle forward movement	Cinematic moments
Micro movements	Tiny natural drift	Realistic selfie footage

Focus and Depth

Term	What It Looks Like
Shallow depth of field	Subject sharp, background blurry
Deep depth of field	Everything in focus
Selective focus	Only the subject is sharp
Soft background blur	Cinematic bokeh look

For UGC, shallow depth of field with soft background blur is your default. It mimics how phone portrait mode looks in real life.

Style and Finish

Term	What It Does
Photorealistic	Real world accuracy, no stylization
Cinematic color grading	Movie like color tones
High contrast	Strong separation between light and dark
Low contrast	Soft, muted tones
Warm tones	Friendly, emotional, golden
Cool tones	Futuristic, dramatic, blue
Natural skin texture	Visible pores and real detail
Visible imperfections	Blemishes, slight asymmetry

For UGC, always lean toward photorealistic, natural skin texture, and visible imperfections. If your output looks too polished or too stylized, it stops feeling like UGC.

Describing the Person (The Most Important Part)

In UGC, the person is the content. If the person looks wrong, nothing else matters. This section gives you a practical vocabulary for describing every part of your subject so the AI generates someone who looks real, specific, and consistent.

Body Type and Build

Be specific about body type. Vague descriptions give the AI room to default to generic results.

Term	What It Means
Slim	Lean, narrow frame
Athletic	Toned, visible muscle definition
Average build	Normal everyday body
Curvy	Fuller figure with visible curves
Stocky	Broad, compact frame
Tall and lean	Long limbs, narrow
Petite	Small frame, shorter

Skin

Skin is where realism lives or dies in AI UGC. The more specific you are, the less "AI smooth" your output will look.

What to Specify	Options
Skin tone	Fair, light, medium, olive, tan, brown, dark brown, deep
Texture	Visible pores, natural unevenness, subtle shine, matte
Imperfections	Freckles, moles, acne scars, redness, under eye circles
Finish	Dewy, matte, natural oil sheen, no makeup look

Always add "realistic human skin texture" and "subtle imperfections" to your UGC prompts. This single habit will dramatically improve your results.

Hair

Hair is one of the biggest giveaways of AI content. Perfect, uniform hair looks fake immediately. Describe it with natural variation.

What to Specify	Options
Length	Pixie, short, chin length, shoulder length, mid back, long
Texture	Straight, wavy, curly, coily, kinky
Style	Loose, tousled, messy bun, ponytail, braided, slicked back, wet look
Volume	Flat, moderate, voluminous
Part	Center, side, deep side, no part
Color	Natural tones, highlights, ombre, roots showing
Imperfections	Flyaways, frizz, uneven sections, baby hairs

Facial Expression

Expression is what makes UGC feel real and human. A blank face or a generic smile will kill the authenticity of any shot.

What to Specify	Options
Overall mood	Happy, serious, contemplative, confident, playful, relaxed
Eyes	Direct gaze, looking away, squinting, bright, tired, focused
Mouth	Slight smile, open smile, neutral, pursed lips, talking mid sentence
Eyebrows	Raised, relaxed, furrowed, arched
Authenticity	Genuine, candid, natural, caught mid moment

For UGC, the best expressions are usually "genuine slight smile with direct eye contact" or "natural expression, caught mid sentence." Avoid anything that feels posed or overly polished.

Clothing and Style

Clothing sets the entire context of a UGC shot. It tells the viewer who this person is and what situation they are in.

What to Specify	Options
Style	Casual, streetwear, athleisure, minimalist, cozy, smart casual
Specific items	Oversized hoodie, leather jacket, white t-shirt, knit sweater
Fit	Oversized, fitted, loose, cropped, tailored
Colors	Neutral tones, black, earth tones, pastels, bold accent
Condition	Pristine, slightly worn, lived in, wrinkled

For UGC, clothing should feel effortless. Think "what would this person actually wear at home" not "what would they wear for a photoshoot." Oversized hoodies, simple t-shirts, and cozy knits work better than anything that looks styled.

Accessories

Small details like accessories make a character feel like a real person with a real life.

Category	Options
Jewelry	Small stud earrings, delicate chain necklace, simple rings, watch
Eyewear	Reading glasses, sunglasses on head, no glasses
Tech	Holding a phone, AirPods in, laptop nearby
Other	Hair clip, scrunchie on wrist, tote bag, coffee cup

Putting It All Together: UGC Prompt Examples

Here are complete prompts that combine everything above for different UGC scenarios.

Casual talking head (TikTok style):

Prompt Example

A realistic iPhone selfie of a 28 year old woman with medium brown skin, shoulder length curly hair with natural flyaways, wearing an oversized cream knit sweater. She has a genuine slight smile and is looking directly at the camera. Natural morning light from a window, cozy bedroom background, shallow depth of field, visible skin texture with subtle imperfections, handheld phone feel, 9:16 vertical.

Product review (holding something):

Prompt Example

A candid phone photo of a 30 year old man with short dark hair and light stubble, wearing a plain black t-shirt. He is holding a small product close to his face with a relaxed expression, slight smirk, eye contact with camera. Soft indoor lighting, clean apartment background, natural skin pores visible, slight under eye circles, realistic and unfiltered.

Lifestyle moment (not looking at camera):

Prompt Example

A realistic photo of a 25 year old woman with long straight blonde hair, freckles, and a relaxed expression. She is sitting cross legged on a couch wearing joggers and a cropped tank top, looking down at her phone with a soft natural smile. Warm afternoon sunlight, living room background, shallow depth of field, natural skin texture, candid unposed moment.

Gym or fitness UGC:

Prompt Example

A realistic gym selfie of a 26 year old athletic man with a medium build, short fade haircut, light brown skin, and a confident expression. He is wearing a fitted dark grey tank top, slightly sweaty with visible skin texture. Bright overhead gym lighting, mirrors in background, handheld phone angle, slight motion blur, authentic and unfiltered.

Morning routine (bathroom selfie):

Prompt Example

A realistic bathroom mirror selfie of a 24 year old woman with dark brown skin, short natural coily hair, wearing a white towel robe loosely tied. She is holding her phone at chest height, relaxed expression with slightly sleepy eyes, no makeup. Soft warm bathroom lighting, slightly steamy mirror, toothbrush on the counter in background, natural skin texture with visible pores, dewy skin, authentic morning energy.

Unboxing or first reaction:

Prompt Example

A candid overhead phone photo of a 27 year old woman with olive skin, messy shoulder length dark wavy hair in a loose bun, wearing an oversized grey hoodie. She is sitting on the floor looking down at an open package with a surprised genuine smile, mouth slightly open. Soft natural daylight from a nearby window, living room carpet floor, scattered packaging around her, realistic skin texture, spontaneous unposed moment, 9:16 vertical.

Coffee shop casual:

Prompt Example

A realistic iPhone selfie of a 32 year old man with fair skin, medium length messy brown hair, light beard, and tired but friendly eyes. He is sitting at a small cafe table holding a paper coffee cup, wearing a dark green flannel shirt with a plain white tee underneath. Warm indoor cafe lighting, blurred people in background, slight natural smile, visible under eye texture, natural skin redness around nose, handheld phone angle, authentic and unfiltered.

Skincare or beauty UGC:

Prompt Example

A realistic extreme close up selfie of a 23 year old woman with light brown skin, natural freckles across her nose and cheeks, straight black hair pulled back with a claw clip. She is applying moisturizer to one cheek with her fingertips, looking at camera with a calm relaxed expression. Soft diffused bathroom lighting, clean white background, visible pores, slight natural shine on forehead, no makeup, dewy realistic skin, iPhone front camera feel.

Cooking or kitchen moment:

Prompt Example

A candid phone photo of a 29 year old man with dark brown skin, short cropped hair with a clean lineup, wearing a simple white t-shirt with a small food stain on the chest. He is standing in a small apartment kitchen holding a wooden spoon, mid laugh with eyes slightly squinted. Warm overhead kitchen lighting, messy counter with ingredients in background, natural skin texture, genuine expression, slight motion blur on the hand, casual and real.

Outdoor walk or errand run:

Prompt Example

A realistic handheld selfie of a 26 year old woman with fair skin, long auburn wavy hair blowing slightly in the wind, small stud earrings, wearing a puffer jacket and a beanie. She is walking outside on a sidewalk, looking at camera with a slight smirk, one AirPod visible in her ear. Overcast natural daylight, blurred street and trees in background, rosy cheeks from cold, natural skin texture with slight redness on nose, candid movement feel, 9:16 vertical.

Let ChatGPT Write Prompts For You

This is the fastest shortcut and the most powerful way.

Open ChatGPT in Thinking Mode and describe what you want in normal language.

Be extremely specific, even if it feels messy. I recommend the voice input.

Add three key instructions:

Which tool you will use

Example: “Write this for Nano Banana Pro” or “Write this for Kling 3.0”

The role you want ChatGPT to take

Example: “You are a professional art director and photographer”

Any important constraints

Example: “UGC style, vertical 9:16, natural phone look, realistic skin texture”

For Beginners: Use Our Custom GPT

If writing detailed prompts from scratch feels overwhelming right now, we built a custom GPT that does it for you. Just describe what you want in plain everyday language, and it will turn your idea into a detailed, professional UGC prompt ready to paste into Nano Banana. It follows all the realism rules from this book automatically, so you do not have to memorize anything.

Try it here: https://chatgpt.com/g/g-69bc3acf48fc8191a62968f22e03da0c-ugc-image-prompt-generator

Try it here: [https://chatgpt.com/g/g-69bc3acf48fc8191a62968f22e03da0c-ugc-image-prompt-generator](https://chatgpt.com/g/g-69bc3acf48fc8191a62968f22e03da0c-ugc-image-prompt-generator)

Chapter 7 Create a Realistic Model

What “Model” Means Here

In this book, when we say "model" we are not talking about the AI software. We are talking about your main person or character. This is the person you will reuse across all your content, from AI UGC clips and talking videos to product holding shots and full scene setups. Think of your model as the face of your content. The more realistic your base character is from the start, the easier every single workflow in this book becomes later.

Two Ways to Create a Realistic Person

Option 1: Prompt From Scratch (Fast and Flexible)

This is the simplest and fastest method. You describe the person you want in a prompt, generate a few variations, and pick the best one. It is great for quick experiments, learning how prompt changes affect the output, and testing different looks before committing to a character. The easiest way to do this is to use ChatGPT in Thinking Mode. Describe the person you want in normal everyday language and ask it to turn that into a proper prompt.

For example: "Act as a professional casting director and photographer. Write a Nano Banana prompt for a realistic 25 year old woman, natural phone photo style, realistic skin texture, subtle imperfections."

Option 2: Use Higgsfield AI Influencer (Most Controlled)

Higgsfield has a built in feature called AI Influencer that lets you build a character using visual selection controls instead of writing a prompt from scratch. You choose key traits like gender, ethnicity, skin tone, eye color, age range, skin conditions, face and body structure, hairstyle, and overall style. This gives you much more control over the result because you are making specific decisions instead of hoping the model interprets your text correctly. You get a strong, consistent base character without needing to write a single line of prompt.

How To Use AI Influencer (Simple Workflow)

Open Higgsfield and go to AI Influencer

Use the builder to select the main traits

Generate a few variations

Pick the most realistic result as your base character

Example:

Human, Female, European, Green eyes, Freckles, Adult, Human skin, Slim

Chapter 8 Edit the Character

What “Editing” Means in AI

Editing a character in AI is not like editing a photo in Photoshop where you have pixel level control. It means asking the model to change specific parts of an image while keeping the same person intact. That could be swapping a hairstyle, changing an outfit, adding tattoos or accessories, or shifting the overall color and vibe of the shot. The key challenge is control. You want the change you asked for without losing the identity of your character. The face, proportions, and overall feel should stay the same. Only the detail you targeted should change.

Start With A Reference Image

You need one strong base image first.

It can be:

generated from scratch
created with AI Influencer in Higgsfield
a real photo you own and want to use

Download the image you want to edit.

This becomes your reference.

As an example we use the one from “Create a realistic model”

The Simple Editing Workflow (Higgsfield + Nano Banana)

Go to Image and choose Nano Banana Pro on Higgsfield

Drag and drop your character image

Write what you want to change

Keep it specific and minimal.

Example edits:

“Change the hairstyle to a short bob haircut.”
“Change the outfit to a black turtleneck and beige coat.”
“Add small tattoos on the forearm.”

Generate multiple variations and pick the cleanest one.

Example output:

The More Powerful Method: Edit With Image References

Text edits work, but the most controlled method is using reference images.

Instead of only describing the change, you show it.

Examples:

upload your character image as the base
upload a second image as a hairstyle reference

upload another image as an outfit reference

Then prompt like this:

Prompt Example

Keep the same person and face. Use the hairstyle from reference image 2. Use the outfit style from reference image 3. Photorealistic, natural lighting, realistic skin texture.

Output example:

Make sure to generate multiple runs.

This method is great for:

copying a specific haircut
matching a fashion style
swapping tops and bottoms
adding tattoos that look real etc

Chapter 9 Fix the Character

What “Fixing” Means

Fixing a character means locking down their identity so you can place them in any new scene without the face, body, or overall look changing between generations. Without this step, every new image risks producing a slightly different version of your character. The goal is simple: one person, same identity, unlimited scenes. The best way to achieve this is by creating a strong anchor image that serves as your consistency reference for every future generation.

Make Your Anchor (Character Sheet)

A character sheet is a multi angle reference image.

It becomes your consistency base for future generations.

Open Higgsfield

Go to Image and select Nano Banana 2

Set your settings

Aspect ratio: 3:2 (good for multi panel layouts)
Quality: 4K

Upload your character image

Use the best face you made earlier (from AI Influencer or from scratch).

Paste this prompt and generate a few times

Pick the cleanest result.

Character sheet prompt:

Create a professional character reference sheet based strictly on the uploaded reference image. Use a clean, neutral plain background and present the sheet as a technical model turnaround while matching the exact visual style of the reference. Arrange the composition into two horizontal rows. Top row: four full body standing views side by side in this order: front view, left profile view (facing left), right profile view (facing right), back view. Bottom row: three close up portraits aligned beneath in this order: front portrait, left profile portrait (facing left), right profile portrait (facing right). Maintain strong identity consistency across every panel. Keep the subject in a relaxed A pose with consistent scale and alignment between views, accurate anatomy, and clear silhouette. Ensure even spacing, clean separation, uniform framing, consistent head height across the full body lineup, and consistent facial scale across portraits. Use consistent lighting across all panels with natural controlled shadows. Output a crisp, sharp, print ready reference sheet.

Example output:

Selection Checklist (Pick The Best Anchor)

Choose the sheet where:

the face looks like the same person in every angle
hairline and hairstyle stay consistent
outfit and accessories stay consistent
background is clean and simple

Tip:

If the anchor is weak, everything later becomes harder.

Regenerate until you get a strong one.

Generate New Scenes Using The Anchor

Once you have your character sheet, you can create consistent scenes.

Keep Nano Banana selected (2 or Pro)

Upload your chosen character sheet image

Write one simple scene prompt For example, shes playing bowling

Generate

Output example

Pro Tips (Stronger Realism + Better Skin)

1) Add a Face Close Up Anchor

Your character sheet is great for full body consistency, but it is not always the best for skin texture.

If you want ultra realistic faces, you should create one extra anchor:

a close up face reference with detailed skin.

Why this helps:

the face becomes clearer
skin texture becomes more consistent
you reduce the “smooth plastic” look

2) Generate a Hyper Realistic Close Up

Keep Nano Banana selected and upload your character sheet as the reference.

Then use a prompt like this:

“give me close up photo that has hyper realistic skin of this girl ”

Generate a few times and pick the best one.

Example:

3) Use Two References For New Scenes

From now on, when you generate new images, use:

Reference 1: the character sheet (identity and body consistency)
Reference 2: the close up face photo (skin and facial realism)

This two reference setup is one of the easiest ways to get both: consistency and realism at the same time.

Chapter 10 Realistic Image from Real Photos

What This Method Does

This workflow lets you recreate the look of a reference photo without copying the original person.

You will:

extract a detailed style blueprint from a photo using ChatGPT as JSON
use that JSON to generate new images with the same composition, lighting, and vibe
reuse the same blueprint to place your own character into the exact same scene

Use cases:

copy a mood and shot setup quickly
keep one character consistent across multiple scenes

Pick a Reference Photo

Choose one photo with a look you want to copy.

Extract the Style Blueprint (ChatGPT JSON)

Open ChatGPT and do this:

Upload the reference photo

Paste your Visual Analysis JSON template

Run it in Thinking Mode if available

Goal:

You get one clean JSON object describing the style and a ready to use generation prompt inside it.

Use your same Visual Analysis template here. ”””

## Context & Goal
You are an expert visual analysis specialist with 30+ years of experience in digital art, photography, graphic design, and AI image generation. You excel at deconstructing visual elements and translating artistic styles into technical specifications.

Your task: Analyze uploaded images and return comprehensive JSON profiles for recreating the visual style. Make sure composition, lighting, texture, angle and everything should exactly the same in detail and ultra realistic.

## Input Variable
Required Input:
- `{{$json.image}}` - Single uploaded image file

## Output Format
Return ONLY valid JSON. No explanations, no commentary, no markdown formatting.

Output format:
{JSON_ANALYSIS_OBJECT}
## Core JSON Schema
{
"metadata": {
"confidence_score": "high/medium/low - assessment of analysis accuracy",
"image_type": "photograph/digital art/illustration/graphic design/mixed media",
"primary_purpose": "marketing/editorial/social media/product/portrait/landscape/abstract"
},
"composition": {
"rule_applied": "rule of thirds/golden ratio/center composition/symmetry/asymmetry",
"aspect_ratio": "width:height ratio or format description",
"layout": "grid/single subject/multi-element/layered",
"focal_points": [
"Primary focal point location and element",
"Secondary focal point if present"
],
"visual_hierarchy": "Description of how eye moves through the image",
"balance": "symmetric/asymmetric/radial - with description"
},
"color_profile": {
"dominant_colors": [
{
"color": "Specific color name",
"hex": "#000000",
"percentage": "approximate percentage of image",
"role": "background/accent/primary subject"
}
],
"color_palette": "complementary/analogous/triadic/monochromatic/split-complementary",
"temperature": "warm/cool/neutral - overall feeling",
"saturation": "highly saturated/moderate/desaturated/black and white",
"contrast": "high contrast/medium contrast/low contrast/soft"
},
"lighting": {
"type": "natural window/artificial/mixed/studio/practical lights",
"source_count": "single source/multiple sources - number and placement",
"direction": "front/45-degree side/90-degree side/back/top/bottom/diffused from above",
"directionality": "highly directional/moderately directional/diffused/omni-directional",
"quality": "hard light/soft light/dramatic/even/gradient/sculpted",
"intensity": "bright/moderate/low/moody/high-key/low-key",
"contrast_ratio": "high contrast (dramatic shadows)/medium contrast/low contrast (flat)",
"mood": "cheerful/dramatic/mysterious/calm/energetic/professional/casual",
"shadows": {
"type": "harsh defined edges/soft gradual edges/minimal/dramatic/absent",
"density": "deep black/gray/transparent/faint",
"placement": "under subject/on wall/from objects/cast patterns",
"length": "short/medium/long - shadow projection distance"
},
"highlights": {
"treatment": "blown out/preserved/subtle/dramatic/specular",
"placement": "on face/hair/clothing/background - where light hits strongest"
},
"ambient_fill": "present/absent - secondary fill light reducing shadows",
"light_temperature": "warm (golden)/neutral/cool (blue) - color cast"
},
"technical_specs": {
"medium": "digital photography/3D render/digital painting/vector/photo manipulation/mixed",
"style": "realistic/hyperrealistic/stylized/minimalist/maximalist/abstract/surreal",
"texture": "smooth/grainy/sharp/soft/painterly/glossy/matte",
"sharpness": "tack sharp/slightly soft/deliberately soft/bokeh effect",
"grain": "none/film grain/digital noise/intentional grain - level",
"depth_of_field": "shallow/medium/deep - with subject isolation description",
"perspective": "straight on/low angle/high angle/dutch angle/isometric/one-point/two-point"
},
"artistic_elements": {
"genre": "portrait/landscape/abstract/conceptual/commercial/editorial/street/fine art",
"influences": [
{
"type": "sans-serif/serif/script/display/handwritten",
"weight": "thin/light/regular/medium/bold/black",
"characteristics": "modern/vintage/playful/serious/technical"
}
],
"placement": "overlay/integrated/border/corner - with strategic description",
"integration": "subtle/prominent/dominant/background"
},
"subject_analysis": {
"primary_subject": "Main subject description",
"positioning": "center/left/right/top/bottom/rule of thirds placement",
"scale": "close-up/medium/full/environmental/macro",
"interaction": "isolated/interacting with environment/multiple subjects",
"facial_expression": {
"mouth": "closed smile/open smile/slight smile/neutral/serious/pursed - exact mouth position",
"smile_intensity": "no smile/subtle/moderate/broad/wide - degree of smile",
"eyes": "direct gaze/looking away/squinting/wide/relaxed/intense - eye expression",
"eyebrows": "raised/neutral/furrowed/relaxed - brow position",
"overall_emotion": "happy/content/serious/playful/confident/approachable/guarded/warm/cold",
"authenticity": "genuine/posed/candid/formal/natural"
},
"hair": {
"length": "pixie/short/chin-length/shoulder-length/mid-back/long/very long - specific measurement",
"cut": "blunt/layered/shaggy/undercut/fade/tapered/disconnected - exact style name",
"texture": "straight/wavy/curly/coily/kinky - natural pattern with specific wave type (loose waves/tight curls/s-waves)",
"texture_quality": "smooth/coarse/fine/thick/thin - hair strand thickness",
"natural_imperfections": "flyaways/frizz/uneven sections/growth patterns/cowlicks - observable natural variation",
"styling": "sleek/tousled/wet look/blow-dried/natural/product-heavy/messy/textured - exact current state",
"styling_detail": "Degree of styling: heavily styled/lightly styled/unstyled, product visibility, movement quality",
"part": "center/side/deep side/no part/zigzag - exact location with precision",
"volume": "flat/moderate volume/voluminous - root lift and overall fullness",
"details": "Specific features: bangs type, face-framing layers, buzzed sections, faded areas, length variations, texture inconsistencies"
},
"hands_and_gestures": {
"left_hand": "Exact position and gesture - touching face/holding object/resting on surface/in pocket/behind back/clasped/visible or not visible",
"right_hand": "Exact position and gesture - touching face/holding object/resting on surface/in pocket/behind back/clasped/visible or not visible",
"finger_positions": "Specific details: pointing/peace sign/thumbs up/relaxed/gripping/spread/interlaced/curled",
"finger_interlacing": "if hands clasped: natural loose interlacing/tight formal interlacing/fingers overlapping/thumbs position",
"hand_tension": "relaxed/tense/natural/posed/rigid - muscle tension observable",
"interaction": "What hands are doing: holding phone/touching hair/on hip/crossed/clasped at waist/clasped at chest/gesturing",
"naturalness": "organic casual gesture/deliberately posed/caught mid-motion/static formal pose"
},
"body_positioning": {
"posture": "standing/sitting/leaning/lying - exact position",
"angle": "facing camera/45 degree turn/profile/back to camera",
"weight_distribution": "leaning left/right/centered/shifted",
"shoulders": "level/tilted/rotated/hunched/back"
}
},
"background": {
"setting_type": "indoor/outdoor/studio/natural environment - specific location",
"spatial_depth": "shallow/medium/deep - layers description",
"elements_detailed": [
{
"item": "Specific object name (if plant: species like monstera/pothos/bird of paradise/fern)",
"position": "left/right/center/top/bottom - exact placement with quadrant",
"distance": "foreground/midground/background",
"size": "dominant/medium/small - relative scale and proportion",
"condition": "new/worn/vintage/pristine/wilted/thriving - state description",
"specific_features": "For plants: flower color, leaf pattern, pot type; For objects: brand, wear, details"
}
],
"wall_surface": {
"material": "painted drywall/concrete/brick/wood paneling/tile/wallpaper/plaster - exact base material",
"surface_treatment": "smooth paint/textured paint/raw concrete/polished concrete/exposed brick/finished/unfinished",
"texture": "perfectly smooth/slightly textured/rough/patterned/brushed - tactile quality",
"finish": "matte/satin/glossy/flat - sheen level",
"color": "Specific color with undertones (e.g., warm gray, cool blue-gray, off-white)",
"color_variation": "uniform/gradient/patchy/streaked - color consistency",
"features": "clean/water stains/vertical streaks/horizontal marks/cracks/patches/fixtures/artwork/scuffs - ALL observable surface details",
"wear_indicators": "pristine/aged/weathered/industrial/residential - condition and style"
},
"floor_surface": {
"material": "wood/tile/carpet/concrete/grass - exact type",
"color": "Specific color",
"pattern": "solid/checkered/striped/herringbone - if present"
},
"objects_catalog": "List every visible object with position: furniture pieces, decorative items, functional objects, natural elements",
"background_treatment": "blurred/sharp/minimal/detailed/gradient/textured"
},
"generation_parameters": {
"prompts": [
"Detailed technical prompt for recreating this style",
"Alternative angle or variation prompt"
],
"keywords": [
"keyword1",
"keyword2",
"keyword3",
"keyword4",
"keyword5"
],
"technical_settings": "Recommended camera/render settings description for recreation",
"post_processing": "Color grading, filters, or editing techniques applied"
}
}
{
"type": "sans-serif/serif/script/display/handwritten",
"weight": "thin/light/regular/medium/bold/black",
"characteristics": "modern/vintage/playful/serious/technical"
}
],
"placement": "overlay/integrated/border/corner - with strategic description",
"integration": "subtle/prominent/dominant/background"
},
"subject_analysis": {
"primary_subject": "Main subject description",
"positioning": "center/left/right/top/bottom/rule of thirds placement",
"scale": "close-up/medium/full/environmental/macro",
"interaction": "isolated/interacting with environment/multiple subjects",
"facial_expression": {
"mouth": "closed smile/open smile/slight smile/neutral/serious/pursed - exact mouth position",
"smile_intensity": "no smile/subtle/moderate/broad/wide - degree of smile",
"eyes": "direct gaze/looking away/squinting/wide/relaxed/intense - eye expression",
"eyebrows": "raised/neutral/furrowed/relaxed - brow position",
"overall_emotion": "happy/content/serious/playful/confident/approachable/guarded/warm/cold",
"authenticity": "genuine/posed/candid/formal/natural"
},
"hair": {
"length": "pixie/short/chin-length/shoulder-length/mid-back/long/very long - specific measurement",
"cut": "blunt/layered/shaggy/undercut/fade/tapered/disconnected - exact style name",
"texture": "straight/wavy/curly/coily/kinky - natural pattern with specific wave type (loose waves/tight curls/s-waves)",
"texture_quality": "smooth/coarse/fine/thick/thin - hair strand thickness",
"natural_imperfections": "flyaways/frizz/uneven sections/growth patterns/cowlicks - observable natural variation",
"styling": "sleek/tousled/wet look/blow-dried/natural/product-heavy/messy/textured - exact current state",
"styling_detail": "Degree of styling: heavily styled/lightly styled/unstyled, product visibility, movement quality",
"part": "center/side/deep side/no part/zigzag - exact location with precision",
"volume": "flat/moderate volume/voluminous - root lift and overall fullness",
"details": "Specific features: bangs type, face-framing layers, buzzed sections, faded areas, length variations, texture inconsistencies"
},
"hands_and_gestures": {
"left_hand": "Exact position and gesture - touching face/holding object/resting on surface/in pocket/behind back/clasped/visible or not visible",
"right_hand": "Exact position and gesture - touching face/holding object/resting on surface/in pocket/behind back/clasped/visible or not visible",
"finger_positions": "Specific details: pointing/peace sign/thumbs up/relaxed/gripping/spread/interlaced/curled",
"finger_interlacing": "if hands clasped: natural loose interlacing/tight formal interlacing/fingers overlapping/thumbs position",
"hand_tension": "relaxed/tense/natural/posed/rigid - muscle tension observable",
"interaction": "What hands are doing: holding phone/touching hair/on hip/crossed/clasped at waist/clasped at chest/gesturing",
"naturalness": "organic casual gesture/deliberately posed/caught mid-motion/static formal pose"
},
"body_positioning": {
"posture": "standing/sitting/leaning/lying - exact position",
"angle": "facing camera/45 degree turn/profile/back to camera",
"weight_distribution": "leaning left/right/centered/shifted",
"shoulders": "level/tilted/rotated/hunched/back"
}
},
"background": {
"setting_type": "indoor/outdoor/studio/natural environment - specific location",
"spatial_depth": "shallow/medium/deep - layers description",
"elements_detailed": [
{
"item": "Specific object name (if plant: species like monstera/pothos/bird of paradise/fern)",
"position": "left/right/center/top/bottom - exact placement with quadrant",
"distance": "foreground/midground/background",
"size": "dominant/medium/small - relative scale and proportion",
"condition": "new/worn/vintage/pristine/wilted/thriving - state description",
"specific_features": "For plants: flower color, leaf pattern, pot type; For objects: brand, wear, details"
}
],
"wall_surface": {
"material": "painted drywall/concrete/brick/wood paneling/tile/wallpaper/plaster - exact base material",
"surface_treatment": "smooth paint/textured paint/raw concrete/polished concrete/exposed brick/finished/unfinished",
"texture": "perfectly smooth/slightly textured/rough/patterned/brushed - tactile quality",
"finish": "matte/satin/glossy/flat - sheen level",
"color": "Specific color with undertones (e.g., warm gray, cool blue-gray, off-white)",
"color_variation": "uniform/gradient/patchy/streaked - color consistency",
"features": "clean/water stains/vertical streaks/horizontal marks/cracks/patches/fixtures/artwork/scuffs - ALL observable surface details",
"wear_indicators": "pristine/aged/weathered/industrial/residential - condition and style"
},
"floor_surface": {
"material": "wood/tile/carpet/concrete/grass - exact type",
"color": "Specific color",
"pattern": "solid/checkered/striped/herringbone - if present"
},
"objects_catalog": "List every visible object with position: furniture pieces, decorative items, functional objects, natural elements",
"background_treatment": "blurred/sharp/minimal/detailed/gradient/textured"
},
"generation_parameters": {
"prompts": [
"Detailed technical prompt for recreating this style",
"Alternative angle or variation prompt"
],
"keywords": [
"keyword1",
"keyword2",
"keyword3",
"keyword4",
"keyword5"
],
"technical_settings": "Recommended camera/render settings description for recreation",
"post_processing": "Color grading, filters, or editing techniques applied"
}
}
## Analysis Rules

### Composition Analysis
- Identify grid systems, alignment, and spatial relationships
- Note use of negative space and breathing room
- Describe visual flow and eye movement path
- Identify focal points using color, contrast, size, or placement
- Assess balance between elements

### Color Analysis
- Extract dominant colors with approximate hex values
- Identify color relationships (complementary, analogous, etc.)
- Assess temperature bias (warm vs cool)
- Note saturation levels and contrast intensity
- Describe how color creates mood and directs attention

### Lighting Assessment
- Determine light source type, direction, and quality
- Analyze shadow characteristics and depth
- Assess highlight preservation or blown-out areas
- Describe overall lighting mood and emotional impact
- Note light's role in creating dimension and form

### Technical Evaluation
- Identify creation medium and technique
- Assess texture, sharpness, and grain characteristics
- Evaluate depth of field and focus points
- Analyze perspective and viewpoint
- Note any technical limitations or intentional choices

### Artistic Context
- Identify genre and artistic influences
- Assess mood, atmosphere, and emotional tone
- Describe visual style (minimal, maximalist, etc.)
- Note any cultural or temporal references
- Evaluate overall aesthetic cohesion

### Typography (if present)
- Identify font styles and weights
- Assess placement and integration strategy
- Evaluate readability and hierarchy
- Describe relationship to other visual elements

### Subject Treatment
- **Hair Analysis (CRITICAL)**:
- Measure exact length using body reference points (ears, shoulders, etc.)
- Identify specific cut style by name (bob, shag, fade, undercut, etc.)
- Note texture pattern AND texture quality - distinguish between natural imperfect texture vs "AI smooth" perfect texture
- Capture natural imperfections: flyaways, frizz, uneven sections, growth patterns, texture variation
- Describe part location with precision (1 inch from center, deep left, etc.)
- Document all details: bang style, layering, undercut sections, faded areas
- Specify product use evidence (wet look, matte, glossy, natural) and degree of styling
- Note volume and movement quality
- AVOID: Describing hair as "perfect" or overly uniform - real hair has natural variation
- **Hand & Gesture Analysis (CRITICAL)**:
- Describe EACH hand separately with exact position
- Note if hands are visible or hidden (in pockets, behind back, out of frame)
- Document specific finger positions and shapes
- For clasped hands: describe interlacing style (natural loose vs formal tight), thumb positions, finger overlap patterns
- Assess hand tension: relaxed vs tense, organic vs posed
- Describe what hands are interacting with (phone, face, object, clothing, each other)
- Note natural vs posed quality of gesture
- Specify pressure/contact points (lightly touching vs gripping)
- Evaluate overall naturalness: organic casual vs deliberately posed
- **Background Elements (CRITICAL)**:

- Catalog EVERY visible object with exact position and quadrant placement
- For plants: identify species (monstera, pothos, bird of paradise, fern, etc.)
- Describe spatial relationships between objects and their depth layers
- **Wall analysis is CRITICAL**:
- Distinguish between painted drywall vs concrete vs brick vs other materials
- Note surface treatment: smooth paint vs textured vs raw vs polished
- Document finish: matte vs glossy vs satin
- Identify any surface features: water stains, streaks, cracks, patches, wear
- Assess condition: pristine residential vs industrial weathered vs aged
- Document floor type, color, pattern
- Specify distance/depth layer for each element
- Note condition and state of objects (new/worn/vintage/thriving/wilted)
- Describe any text, artwork, or decorative elements
- Include architectural features (windows, doors, molding, fixtures, frames)
- **Lighting Analysis (CRITICAL)**:

- Distinguish between dramatic directional lighting vs flat even lighting
- Assess directionality: highly directional (strong shadows) vs diffused (soft minimal shadows)
- Document shadow characteristics: harsh defined edges vs soft gradual edges vs minimal
- Note contrast ratio: high contrast (dramatic) vs low contrast (flat)
- Identify if ambient fill light is present reducing shadow depth
- Describe how light sculpts the subject vs evenly illuminates
- Document cast shadows from objects (like plants) on walls
- Note shadow density: deep black vs gray vs faint
- **Facial Expression (CRITICAL)**:

- Capture exact mouth position: closed smile vs slight smile vs neutral vs serious
- Quantify smile intensity: no smile/subtle/moderate/broad
- Note eye expression and gaze direction
- Assess overall emotional tone: warm approachable vs serious neutral
- Distinguish between genuine natural expression vs posed formal expression
- Analyze primary subject and positioning

- Assess scale and framing choices
- Describe subject-background relationship
- Note any secondary subjects or supporting elements
### Generation Parameters

- Create actionable technical prompts for recreation
- Extract relevant keywords for searchability
- Recommend technical settings for similar results
- Describe post-processing techniques applied
## Output Requirements

- **Format**: Valid JSON only

- **No markdown**: No ```json``` blocks, no backticks
- **No commentary**: No explanatory text before/after JSON
- **No instructions**: No "Here is your analysis" or "Copy this"
- **Clean structure**: Properly formatted, parseable JSON
- **Single object**: Return one complete JSON analysis object
- **Comprehensive**: All sections must be populated with detailed analysis
- **Specific**: Use precise technical terminology, not vague descriptions
- **Actionable**: Generation parameters must be detailed enough for recreation
## Processing Logic

1. Receive image input from `{{$json.image}}`

2. Perform comprehensive visual analysis across all categories
3. Extract technical specifications and artistic elements
4. Generate recreation prompts and parameters
5. Output single, complete JSON object
## Quality Standards

- **Confidence score**: Honest assessment of analysis certainty

- **Hex codes**: Approximate but reasonable color values
- **Specific descriptions**: Avoid generic terms like "nice" or "good"
- **Technical accuracy**: Use correct terminology for medium and technique
- **Completeness**: Every JSON field must contain meaningful analysis
- **Actionability**: Prompts and keywords must be specific enough to recreate style
### CRITICAL Accuracy Requirements:

**Hair Description Must Include:**

- Exact length measurement using reference points
- Specific cut style name (not just "short" or "long")
- Texture type AND texture quality (avoid "perfect" - describe natural variation)
- Natural imperfections: flyaways, frizz, uneven sections, texture inconsistencies
- Styling state with product evidence and degree of styling
- Part location with precision
- Volume and movement quality
- All unique features (bangs, layers, fades, undercuts)
- **AVOID**: "AI smooth" or overly perfect descriptions - real hair has variation
**Hand & Gesture Description Must Include:**

- Position of BOTH hands (even if one is hidden)
- Exact finger configurations
- For clasped hands: interlacing pattern (loose/tight), thumb position, finger overlap
- Hand tension assessment (relaxed vs tense)
- What hands are interacting with
- Natural vs posed quality with specific evidence
- Contact points and pressure
- Overall gesture naturalness assessment
**Background Description Must Include:**

- Every visible object cataloged with quadrant position
- Plant species identification (not just "plant")
- **Wall material distinction** (painted drywall vs concrete vs brick - THIS IS CRITICAL)
- Wall surface treatment (smooth paint vs raw concrete vs textured)
- Wall finish (matte vs glossy) and condition (pristine vs weathered vs industrial)
- Any surface features: water stains, vertical streaks, cracks, patches
- Floor materials with exact colors
- Spatial depth layers (foreground/mid/background) for each element
- Object conditions and states
- Architectural features
- Text or decorative elements
**Lighting Description Must Include:**

- Directionality assessment (highly directional vs diffused)
- Contrast ratio (high/medium/low)
- Shadow characteristics: edge quality (harsh vs soft), density (deep vs faint), placement
- Cast shadows from objects onto walls
- Presence or absence of ambient fill light
- Whether lighting is dramatic and sculpting vs flat and even
- This distinction is CRITICAL for recreation accuracy
**Facial Expression Must Include:**

- Exact mouth position (closed smile/slight smile/neutral/serious)
- Smile intensity quantification
- Eye expression details
- Overall emotional tone (warm vs neutral vs serious)
- Genuine vs posed quality assessment

Generate the Look in Higgsfield (Nano Banana Pro)

Open Higgsfield

Select Image → Nano Banana Pro

Paste the prompt from your JSON output

Match the aspect ratio to the reference photo (You can just choose auto)

Generate multiple outputs

Optional: Put Your Own Character Into the Same Scene

This is where it gets powerful.

Goal:

Keep the scene constant, swap the person.

Workflow:

Prepare your character references

Reuse the same JSON scene blueprint

Paste the same prompt blueprint you extracted from the reference scene.

Add your character as the subject using references

Upload your character references in Nano Banana Pro, then generate variations.

Generate multiple outputs and pick the best.

Chapter 11 How to Make Them Talk

What You’re Making

In this chapter, you will learn how to create a realistic UGC style image or video where your model holds an iPhone with actual content visible on the screen. This is perfect for app demos, landing pages, and product showcases. The technique is simpler than it sounds. You generate the scene with a green screen on the phone first, then replace that green screen with whatever content you want to display. This two step approach gives you full control over what appears on screen while keeping the rest of the image looking natural and realistic.

Step 1: Create the Base Selfie Image (Higgsfield)

Of course you can use your own model, I show you some other way here.

Open Higgsfield

Go to Image and choose Soul 2.0

Choose “General”

Use a prompt like this

Example prompt for a realistic selfie vibe:

Prompt Example

A realistic, candid iPhone selfie portrait of a super beautiful 25 year old white American woman standing in a cozy New York bar, casually holding a simple Coke bottle. She has a blond mid length wavy wet look surfer hairstyle and wears a luxury black leather jacket resembling Off White, casual but stylish. Warm bar lighting creates natural reflections on her skin and jacket with realistic textures and natural skin details. Shot slightly from above with a 0.5x wide angle iPhone selfie feel, visible shoulder and hand, spontaneous imperfect framing, natural smirk, unfiltered everyday authenticity, no heavy beauty filters.

Generate a few variations and choose the most believable one.

Example:

Generate the Talking Video

Open Higgsfield Video

Go to Video and select Kling 3.0

Upload your final image

Drag and drop the selfie image you prepared.

Paste a talking video prompt

You can decide what will happen the video, gesture, script etc.

Example prompt:

Prompt Example

She casually talks how ai creator.co is good for ai creator. You can actually learn anything about ai visual creation there. As if casually recording to her friends. Super realistic. Talks calmly

Set your settings and generate

Aspect ratio: 9:16
Duration: 8 seconds (short is easier to make realistic)

Result:

Chapter 12 UGC Ad with Product in Hand

Prepare the Model and Background

First, you need a realistic base frame.

Use the same workflow from the section:

Realistic Images From a Real Photo

Quick version:

Find a UGC reference image you like (pose, framing, vibe)

Extract the style blueprint with ChatGPT JSON

Generate a matching frame in Nano Banana Pro

Where to find good references:

Search Pinterest for keywords like:

UGC product

UGC holding product

TikTok UGC ad

Pick a reference that clearly shows:

the hand position
the product angle
phone style framing

Example model generated:

Prepare the Product Image

Choose the product you want them to hold.

In this example, we use Labubu.

Try to use a product image that is:

clean
high resolution
clear shape
easy to recognize

Combine Both Images in Nano Banana Pro

Now you will blend the model frame and the product into one realistic image.

Open Higgsfield

Go to Image → Nano Banana Pro

Upload two images

Image 1: your realistic model frame
Image 2: the product image

Add a simple prompt

Keep it short and direct.

Example prompt:

“Her holding this super naturally.”

Generate multiple variations and choose the most natural one.

Choose the Most Natural Result

Pick the image where:

the fingers look normal
the grip makes sense
the product scale feels right
shadows and reflections match the room lighting
the face still looks like the same person

If hands look weird, regenerate more.

Hands are one of the hardest parts in AI.

Example:

Turn It Into a Talking UGC Video (Veo 3.1)

Now follow the same workflow as the How to Make Them Talk section.

Go to Higgsfield Video

Select Veo 3.1

Upload your final image

Use the one where the product holding looks best.

Use a prompt like this

Example prompt:

Prompt Example

Photorealistic TikTok style vertical video of a young woman sitting casually in her cozy bedroom, talking warmly to camera while holding a small brown Labubu plush. The moment feels completely unscripted and real, like a genuine video she recorded for her followers. Natural daylight through sheer curtains, soft colors, slightly messy desk in the background. She introduces Labubu with an affectionate smile, showing its cute weird face. Handheld iPhone selfie feel, slight natural shake, realistic micro expressions, natural skin texture, casual pacing, no scripted vibe.

Generate a few versions and choose the most believable one.

Chapter 13 Realistic Animation

What You're Making

In this chapter, you will learn how to create a realistic dance animation where your AI character follows the exact movement from a real video. The core idea is straightforward: you find a real dance clip with the motion you want, then swap in your own character while keeping the movement identical.

Pick a Dance Video to Copy

Find any short dance clip.

Pinterest works well.

Search keyword:

“dance”

Pick a video where:

the full body is visible
the background is not too chaotic
the movement is clear

Extract the First Frame

We need the first frame as the pose reference.

Open frame-extractor.com

Upload your dance video

Set frames to extract: 1

Extract and download the frame

This frame becomes your pose and camera reference.

Download the frame.

Recreate the Same Pose With Your Character (Nano Banana Pro)

Now we swap the dancer into your consistent character.

Open Higgsfield

Go to Image → Nano Banana Pro

Upload three references

your character sheet
your face close up anchor (optional but recommended)
the first frame you extracted from the dance video

Use a simple prompt like this

Prompt Example

The girl from reference 1 and 2 is posing exactly like reference 3. Keep full body framing. White background.

Generate multiple variations and pick the one where:

pose matches closely
the body proportions look correct
the face still feels like your character

You can change the background later.

For learning, white background makes everything easier.

Output:

Animate With Motion Control (Kling 3.0)

Now we apply the real motion.

Open Higgsfield Video

Select Kling 3.0 Motion Control

Set your references

Left side: the original dance video
Right side: your generated character pose frame

Set the scene background

If you want a white background, set it to Image.

Generate

The output should follow the original motion while keeping your character.

Output:

Pro Tips (Cleaner Motion, Less Weirdness)

Start short. Keep your clips between 5 and 8 seconds. Short sequences are much easier for the model to handle than long choreography.

Pick simple, readable motion. Too much spinning or fast arm movement can break hands and limbs. If the original clip has complex motion, the output will almost certainly have issues.

Fix the pose frame first. If the result looks off, do not keep regenerating the video. Go back and improve your starting image. Motion control quality depends heavily on how clean your first frame is.

Chapter 14 iPhone with Content Inside

What You're Making

Prepare the Base Model Scene

Since we are near the end of the book, this is a good moment to show that the workflows you have learned are not locked to one model. Everything we have covered applies no matter which tool you use. So for this chapter, we will use Soul 2.0 instead of Nano Banana. You can of course still use Nano Banana if you prefer, but this is a good example of how flexible the approach really is.

Go to Higgsfield and choose the Soul model.

Example prompt:

Prompt Example

UGC-style portrait of a woman in her mid 20s, shot on a high-end smartphone. She has a natural Northern European look with fair skin, realistic human skin texture, soft facial features, and expressive light-colored eyes. Her blonde hair is worn down in a relaxed, slightly imperfect style that feels natural and lived-in, not styled for a shoot. She is wearing a simple but stylish outfit, something cozy and modern, effortlessly fashionable without looking staged. The photo is taken in her bedroom, with soft natural morning light coming through a nearby window, creating a warm, gentle glow across her face. She looks directly at the camera with a genuine, confident smile that feels like she is about to give advice to a close friend. Her expression is open, intelligent, and subtly excited, with a sense of passion and enthusiasm behind it. The image feels candid and personal, as if taken spontaneously. Shallow depth of field, slight background blur. Ultra realistic appearance with detailed, human skin texture. No artificial smoothness, no AI look. Authentic, intimate, and trustworthy.

Generate multiple and pick the most realistic one.

Example:

Add an iPhone With a Green Screen

Now we create the same image but with a phone in hand.

Stay in Nano Banana Pro

Upload the image you just generated

Use this prompt:

Prompt Example

The person in the image is holding and showing an iPhone with a green screen on its display towards the camera. entire iphone screen is visible

Example:

Replace the Green Screen With Your Content

Now bring the content you want to show.

Example:

Choose Nano Banana 2

Upload two images:

the iPhone green screen image
the screenshot you want on the phone

Use this prompt:

Prompt Example

replace the green screen with the webpage exactly as it appears in real life. Do not change, redesign, or alter any part of the webpage. Keep all text, layout, colors, proportions, and details fully accurate. Adjust the perspective and tilt so the webpage fits naturally and realistically onto the screen.

Generate a few versions and pick the cleanest screen replacement.

Example:

Turn It Into a Talking Video (Veo 3.1)

Now you can turn it into a video.

Go to Video → Veo 3.1

Upload the final image with the real screen content

Prompt it like a natural iPhone selfie clip

Example prompt:

Prompt Example

REFERENCE_IMAGE: A woman in her mid 20s with a natural Northern European appearance. She has fair, realistic human skin texture, expressive light-colored eyes, and long blonde hair worn down in a casual, natural style. Her skin looks detailed and organic, clearly human, not AI-like. She is wearing a cozy but stylish outfit that feels modern and effortless. She is holding a smartphone in both hands, with the screen visible enough to recognize a website interface. Preserve her facial features, hair texture, clothing, and the phone in her hands. SHOT: Medium close up, framed from the chest up. 35mm lens feel. Natural smartphone perspective. CAMERA: Subtle handheld drift. Frame remains mostly stable. No cinematic sweeps or exaggerated movement. Feels like a real person holding their phone. ENVIRONMENT: A cozy, minimalist bedroom in the morning. She is sitting comfortably in bed, gently propped up against soft white linen pillows. The space feels lived in, calm, and personal. LIGHTING: Soft natural morning light coming from a side window, creating a warm, flattering glow on her face. No harsh shadows, no artificial lighting look. ACTION: She looks directly into the camera with a warm, genuine smile that feels like she is talking to a close friend. TIMING: s0–s4: She begins speaking while maintaining eye contact with the camera. s4–s8: Still speaking, she briefly glances down at her phone and casually tilts the screen slightly toward the camera, revealing part of the AICreator website UI. The UI looks real and natural, not like an inserted overlay. STYLE: Cinematic UGC. High quality smartphone video. Authentic, intimate, and personal. Feels like a video she would actually send to a friend. AUDIO: Speaker: the woman. Tone: conversational, excited but grounded, intelligent and honest. Dialogue: “Okay, real talk. AI was honestly overwhelming for me until I found AICreator. Now everything finally makes sense, and creating feels exciting again.” NEGATIVE: No dramatic camera movement. No polished commercial look. No text overlays. No exaggerated acting. No artificial skin smoothing. No perfect or floating screen UI.”

Example

“replace the green screen with the webpage exactly as it appears in real life. Do not change, redesign, or alter any part of the webpage. Keep all text, layout, colors, proportions, and details fully accurate. Adjust the perspective and tilt so the webpage fits naturally and realistically onto the screen.”

Pro Tips (So It Looks Real)

Make the phone screen fully visible. If the screen is partially cut off or angled too far away, the content becomes unreadable and the whole shot loses its purpose.

Match the brightness. The screen brightness should feel natural within the lighting of the room. A glowing bright screen in a dark moody scene will instantly look fake.

Keep movement small in video. Big hand movements often distort or break the phone shape during generation. Small, subtle motions feel more realistic and keep the screen content readable throughout the clip.

Chapter 15 How to Make Long Video

What You Are Making

In this chapter, you will learn the full pipeline for creating a longer, hyper realistic talking video. You will generate a perfect first frame, animate it into a short clip, then repeat the process to create additional clips and stitch them all together. The key is consistency: composition stays locked, face stays centered, and motion stays tiny so it feels like real front camera footage.

Step 1: Pick a Reference (Optional but Recommended)

Find a real selfie reference that matches your goal. You could use any apparoch that we covered in previous sections. Look for something that is:

Vertical 9:16
Shot from above, phone held over face
White pillow background
Tight crop (hair near top edge, shoulders at bottom)

Save it to your device. If you already know the exact look you want, you can skip the reference and just use your prompt.

Step 2: Get a Clean JSON Prompt in ChatGPT

The goal here is to produce a Nano Banana ready JSON prompt that contains only image generation instructions, no text overlays.

Open ChatGPT

Upload your reference image (if using one)

Paste this message:

Prompt Example

Analyze this reference photo. Output a JSON object with keys: prompt, negative_prompt, settings. The prompt must replicate the composition and lighting. Do not include any text, captions, UI, watermarks, or borders. Settings must include aspect_ratio: 9:16 and photoreal style hints. Return only JSON, nothing else.

Before moving on, do a quick check:

Positive prompt describes the exact composition and realism
Negative prompt bans text, UI, subtitles, logos, watermarks
Aspect ratio is 9:16

Example output:

Example Output

{
  "prompt": "Ultra photorealistic vertical 9:16 iPhone front-camera selfie. A young adult man lying on his back in bed, head resting on a white pillow, shot from above with the phone held directly over the face. Match the reference composition exactly: face centered, tight crop with hair near the top edge, shoulders and upper chest visible at the bottom, pillow filling the background. Soft natural daylight, neutral indoor tone, gentle shadows, no dramatic lighting. Expression: relaxed, slightly open mouth as if speaking mid-sentence, eyes looking straight into the camera. Hair: dark, slightly wet or freshly washed, wavy texture with realistic clumping and shine, individual strands and flyaways clearly visible. Beard: short stubble with natural density variation, crisp individual whiskers along jawline, cheeks, and upper lip. Skin: extremely realistic human skin with visible pores, fine vellus hairs, subtle blemishes/acne marks, mild redness around the nose, natural unevenness and micro specular highlights, realistic under-eye texture, no smoothing, no beauty retouch. Background: white pillow with visible fabric weave and soft folds, off-white wall above the pillow. The final image must look like a real photo taken on a phone, zero AI look, zero artifacts.",
  "negative_prompt": "any text, captions, subtitles, UI, status bar, app interface, comment box, watermark, logo, timestamp, borders, frames, stickers, filters, beauty smoothing, plastic skin, waxy skin, airbrushed, uncanny face, asymmetrical eyes, deformed mouth, extra teeth, blurred beard, fake hair, helmet hair, banding, over-sharpened, HDR look, cartoon, illustration, anime, CGI, 3D render, unnatural lighting, weird shadows, noise patterns",
  "settings": {
    "aspect_ratio": "9:16",
    "style": "photoreal",
    "detail_level": "maximum",
    "skin_texture": "maximum realism",
    "camera": "phone front camera selfie",
    "lighting": "soft natural daylight",
    "composition_lock": "match reference exactly"
  }
}

Step 3: Generate the First Frame in Nano Banana (Higgsfield)

Open Higgsfield

Go to Nano Banana Pro

Paste your JSON prompt

Set:

Aspect ratio: 9:16
Highest quality available
Photoreal or realistic mode if there is a toggle

Generate multiple outputs (at least 4 to 8)

How to pick the best image:

Choose the one where:

Skin texture looks natural (pores, tiny imperfections), not waxy
Hair strands look real, not clumped like plastic
Beard stubble is crisp with individual whiskers visible
Pillow fabric texture looks real (weave and folds)
No artifacts around lips, nostrils, hairline, or jaw

Save the best image in high resolution.

Example:

Step 4: Animate in Veo 3.1

On Higgsfield, select Veo 3.1

Set First Frame to your chosen Nano Banana image

Paste your Veo prompt

Set:

Resolution: 1080 x 1920 vertical
Length: 4 seconds first for testing, then 6 or 8 seconds once it looks right
Quality: highest you can afford for the final version

Generate

Example prompt:

Prompt Example

No subtitles. No music. SHOT: Ultra-tight vertical 9:16 framing, front-camera selfie angle with the young man's face centered, hair near the top edge, and shoulders/upper chest visible at the bottom, replicating the example photo composition. SUBJECT: Young adult male lying on his back on a white pillow, wearing a light-colored casual t-shirt with a soft texture. Skin is ultra-realistic showing pores, subtle blemishes, natural oil sheen, fine facial hairs, and a crisp beard stubble. Hair is slightly damp, dark, wavy with individual strands clearly visible. SCENE: Bedroom interior during soft natural daylight streaming from a window visible at the upper-left edge. Ambient indoor colors are neutral and calm, with natural soft shadows. Pillow and bedding are pure white and smooth, enhancing the daylight effect. VISUAL DETAILS: The pillow fabric is slightly wrinkled with realistic cloth texture. Hair shows fine moisture detail and slightly uneven strands due to dampness. Skin lighting reveals subtle micro-details like small blemishes and tiny facial hairs. No CGI or smoothing artefacts. ACTION AND CAMERA MOTION: The subject exhibits gentle micro-movements: natural blinking, tiny subtle head shifts, relaxed breathing rhythm. During speech, one hand raises naturally to briefly touch or rest behind the head, then stays there. The handheld phone camera shows very slight, authentic shakiness consistent with sleepy selfie footage. CINEMATOGRAPHY: Soft natural lighting with slight directional daylight from window, no harsh shadows. Color palette is neutral indoor tones with warm skin highlights. Emotional tone is casual, sleepy at start, shifting to a faint gentle smile at the last phrase. AUDIO: Voice clear and natural, medium volume, casual relaxed tone. Language: English. Spoken lines: 'Sorry, I just woke up. I have something to tell you. I am not real.' Ambient room silence with subtle breathing sounds. No music or added effects. STYLE: Photorealistic visual style, 4K resolution, no artificial smoothing or enhancements, no text overlays, watermarks, subtitles, or UI elements. Maximum realism with light stabilization allowing slight handheld motion.

Example output:

Step 5: Make Additional Clips With the Same Character

Now you create another scene. The most important thing here is keeping the character, the vibe, and the environment the same across clips. To do this, upload the first image you generated with the JSON prompt. That becomes your core identity reference.

Open Nano Banana Pro

Upload your original first frame as the identity reference

Write a prompt that keeps the identity locked but changes the scene slightly

Prompt Example

Refer to this identity and keep the face, hair, skin, shirt, and room consistent. Change the image into a realistic low-angle phone selfie while he is lying on a bed with both arms extended toward the camera in the foreground. Wider framing, direct eye contact, lips slightly open, soft daylight from the window, white bedding, natural iPhone selfie look, realistic skin texture, no stylization.

Adjust the prompt depending on the scene you want to create, but always keep the core identity details the same: face, hair, skin, clothing, and room.

Generate multiple variations and pick the best one. Then animate it with Kling 3.0 or Veo 3.1 using the same approach from Step 4.

Example:

Step 6: Edit and Combine in CapCut

Import all clips into CapCut (or your preferred editor)

Put them in order

Match the pacing:

Trim dead space at the start of each clip
Keep micro pauses between lines for realism

Put subtittles or effects if needed

Export in 9:16 at high bitrate

Bonus Organize Everything in Flora

Why a Visual Workflow Matters

By now you have learned how to generate images, create characters, animate them, and stitch clips together. But as your projects grow, keeping track of everything gets messy fast. Which reference image did you use for that character? Which prompt produced the best result? What was the exact settings combo that worked?

This is where a node-based workflow tool changes the game. Instead of managing files across folders and tabs, you lay everything out on a single visual canvas. Every generation, every reference, every connection between steps is visible at a glance. You can see your entire project the way a director sees a storyboard.

What Flora Does

Flora is a node-based creative platform where you connect your AI generations visually on an infinite canvas. Think of it as a whiteboard where each node is an image, a video, a prompt, or a generation step. You drag connections between them to show how one output feeds into the next.

Everything we covered in this book (character creation, editing, fixing, product ads, talking videos, animations) can be recreated inside Flora. The difference is that you see the full pipeline at once instead of jumping between browser tabs.

Flora workflow canvas showing AI UGC generations organized visually

Above is a real Flora project for an AI influencer. Every image you see is a generation node, and the connections between them show exactly how the character was built, edited, and iterated. You can zoom in on any step and see the prompt, settings, and result.

See this project live in Flora

Higgsfield vs Flora

Both platforms let you generate AI content, but they work very differently. Here is how they compare:

	Higgsfield	Flora
Interface	Linear, one generation at a time	Visual canvas, everything on one board
Workflow style	Step-by-step, tab-based	Node-based, connected visually
Seeing the full picture	Scroll through history	Zoom out and see everything at once
Organizing iterations	Manual folder management	Drag and connect on canvas
Character consistency	Upload references each time	Link reference nodes, always visible
Collaboration	Single user	Share projects with a link
Best for	Quick single generations, beginners	Complex projects, visual thinkers
Built-in models	Nano Banana, Veo, Kling, Soul	Multiple models via integrations

Neither tool is "better." Higgsfield is faster for quick one-off generations. Flora is better when you are building a full character system or managing a project with many connected outputs. Many creators use both: Higgsfield for the actual generation, Flora for organizing and planning the pipeline.

When to Use Flora

Flora makes the most sense when your project gets complex. If you are creating a single talking head video, Higgsfield alone is fine. But when you are building an AI influencer with a consistent look across 20+ scenes, or managing a product ad campaign with multiple characters and variations, Flora keeps everything visible and connected.

Here are the situations where Flora shines:

Character development. Lay out your anchor sheet, face close-ups, and scene variations on one canvas. See which references produce the best results without scrolling through folders.
Multi-step workflows. Connect your image generation to your video generation to your editing. The entire pipeline is one visual flow.
A/B testing. Generate multiple variations side by side. Compare them visually without opening separate tabs.
Client projects. Share a Flora project link with your client so they can see the full creative process and approve specific outputs.

If you are the kind of person who thinks visually and likes to see how everything connects, Flora will feel natural. If you prefer a simpler, more linear workflow, stick with Higgsfield and come back to Flora when your projects grow.

Now Go Make Something Real

If you have made it this far, you now have something most people do not: a complete, working system for creating AI UGC from scratch. You know how to generate realistic images, build consistent characters, animate them with real motion, put products in their hands, make them talk, and even show content on a phone screen. That is not beginner knowledge anymore. That is a real skill set.

But here is the thing. The tools will keep changing. New models will launch every month. Features will get better, interfaces will shift, and what feels cutting edge today will feel basic six months from now. That is fine. Because what you learned in this book is not about any specific tool. It is about the workflow, the thinking, and the ability to iterate until something looks real. Those skills transfer to any tool, any model, and any platform that comes next.

The biggest mistake you can make now is to stop after reading. The difference between someone who "knows about AI UGC" and someone who is actually good at it is simple: reps. Generate more images. Make more videos. Fail more. Every bad output teaches you something if you pay attention to what went wrong and make one small adjustment.

Start with one workflow from this book. Get comfortable with it. Then stack the next one on top. Before you know it, you will be combining techniques without even thinking about it. You have the system. Now go use it.

AI UGC:The UltimatePlaybook

What Is AI Image Generation?

How It Actually Works

Your prompt is parsed into concepts (subject, style, lighting, mood, composition)

The model begins from random visual noise

It predicts pixels step by step

The picture becomes clearer over many tiny refinement steps

A final image appears

Why Your Results Keep Changing

The Core Workflow

Tools: 3 Categories That Explain Everything

1) Image Models

2) Video Models

3) Platforms

Beginner Rule (So You Don’t Tool Hop)

What Makes a Good Beginner Prompt

Try making one now!

Open Nano Banana in Higgsfield

Paste this prompt

Set your first settings

The One Skill That Makes You Good Fast

Common Beginner Mistakes

What Is AI Video Generation?

How It Works

Read your input (text, image, or both)

Decide what the scene should look like

Predict what should happen next over time

Generate frames in sequence

Stitch those frames into a short video

Three Video Modes You’ll See Everywhere

The Quality Tradeoff (Know This Early)

The Video Prompt Formula

Try Making One Now

Open Higgsfield

Upload your image

How to Evaluate Your Result

What “UGC” Means

UGC vs Influencer Content

So What Is AI UGC?

The Core Risk

What “Ultra Realism” Means in AI UGC

The Simple Test

Why This Matters for Everything Later

Too Many Tools Is the Fastest Way to Get Stuck

The Core Stack We Use in This Book

Platform: Higgsfield

Image Generation model: Nano Banana

Video Generation: Veo and Kling

Prompting and Planning: ChatGPT

Beginner Rule

What Aspect Ratio Means

The 3 Most Useful Ratios

1:1

9:16

16:9

Why It Matters for AI UGC

Two More Settings You’ll See Everywhere

Quality or Resolution

Variations

How AI Understands Visual Language

Director Mindset Beats Camera Mindset

Describing the Scene (Visual Vocabulary)

Shot and Framing

Lighting

Camera Feel and Motion

Focus and Depth

Style and Finish

Describing the Person (The Most Important Part)

Body Type and Build

Skin

Hair

Facial Expression

Clothing and Style

Accessories

Putting It All Together: UGC Prompt Examples

Let ChatGPT Write Prompts For You

Which tool you will use

The role you want ChatGPT to take

Any important constraints

For Beginners: Use Our Custom GPT

AI UGC:
The Ultimate
Playbook