3 ways to use photo-to-video in Gemini

4 min read

Lights. Camera. AI-Action. 🎬

As a creative producer at Google, I bring our stories to life for social posts, video series and events for Googlers. I’m always looking for new ways to create content and engage with audiences around the world.

Enter from stage left: Gemini’s photo-to-video capability, powered by Veo 3. From just an image or written prompt, Gemini will generate an eight-second video clip with sound — including sound effects, ambient background noise and speech.

Here are three ways I use photo-to-video in Gemini, plus some beginner tips for prompting your own videos.

1. Animate illustrations

Turn an illustration into an animation for more compelling visuals in presentations, newsletters and videos.

Prompt: The bicycle rides through an illustration-style 
desert, weaving through cacti.

Videos are generated in a 16:9 landscape orientation and padded with a black border if your image is a different aspect ratio. It can sometimes take more than one try, but don’t be discouraged! Prompting takes practice, and our Veo models are also learning and improving.

2. Turn photography into a motion picture

Transform photos into lifelike video clips, or use your imagination to add whimsy. Start with a simple, high-level prompt, and Gemini will fill in the gaps.

Prompt: The dinosaur skeleton comes to life.

Take it up a notch and add detailed directions in your prompt to make your own vision shine through. To make the scene more dynamic, try adding new characters and sequencing their actions.

Prompt: The figure waves at the camera. While the figure 
is distracted and waving, a golden retriever dog enters the frame 
from the right, panting and wagging its tail. The dog eats the 
ice cream cone out of the figure's other hand. The figure is startled by 
this, and stares at the dog in surprise. The dog is happily wagging 
its tail and licking its lips. The figure looks at the camera.

Your image will be the first frame of the video. The closer and clearer your subject, the easier it is for the model to progress the scene and create a high-quality result. If you’re worried the results appear a little too real, videos have an invisible SynthID digital watermark and a visible watermark to indicate they are AI-generated.

3. Articulate an artistic vision

Pitching (and landing!) creative ideas is an important part of my day-to-day. Realistic renderings from Gemini can better visualize my concept for others, making my pitches more effective.

In this case, the prompt needs to be detailed and precise. While this may be more time-consuming, I find that it’s faster than constructing from a text-only prompt. Gemini’s output based on our real set is also more helpful than using sample photos that may only partially convey my vision. If you need a hand, ask Gemini to help refine and add camera control instructions to your prompt for even better results.

Prompt: Open the scene with the image and hold for one second.
Then, the wall color changes to a bright blue, and a wooden coffee 
table appears in front of the two arm chairs in the image. On the 
coffee table appear two large podcasting microphones. The rest of the 
room is unchanged. Hold for one second. Then, the wall color changes 
to a light gray, and the microphones disappear from the coffee table. 
Next, on the table appears: a black tablecloth, two plates of chicken 
wings, and several bottles of hot sauce. The rest of the room is 
unchanged. Hold for one second. Then, the wall color changes to a 
vibrant pink. The plates of chicken wings, bottles of hot sauce, and 
black tablecloth disappear from the coffee table. Then, on the table 
appears: a bright blue table cloth and a birthday cake with lit candles. 
Birthday balloons appear and float in the background. The rest of the 
room is unchanged. Throughout the video plays an instrumental track 
of an upbeat pop song.

I still fluctuate between feeling excited and uneasy about using AI for creative projects. In these cases, the art wouldn’t have existed otherwise — whether due to lack of resources, time or skill level — allowing the AI-generated media to articulate and elevate my work, rather than replacing it.