AI development is keeping creators on their toes. Every other week there’s a new AI tool, platform, or feature to explore. At the top of my list was animating character faces for a couple of upcoming projects. I need to create consistent, as well as layered or complex imagery for use in these projects. For example, D-ID is a web app that uses real-time face animation and advanced text-to-speech to create an immersive and human-like conversational AI experience.
Midjourney’s “/describe” feature lets you transform images-into-words. It generates four different descriptions based on an image you upload and makes it easy to generate new variations. The four numbers on the bottom are remix buttons — each number matching the corresponding description. Clicking on the number will remix the image based on the new description. Then, you copy/paste the ones you like, and you can make revisions to the text, as well.
“We think this tool will transform your liguistic-visual process both in terms of creative power and discovery.”
- Midjourney team
First, I composited existing images, then I used /describe output as text prompts to create new images. Later, I used other tools such as the CLIP Interrogator to generate new prompts for use with Stable Diffusion 2.0 using the ViT-H-14 OpenCLIP model. I used Midjourney to create an image that became the base or source image for the other tool (see below). Then, I copy and pasted the CLIP description output in Midjourney and composited the results using Adobe Photoshop.
I really like how text plays a key role in AI image generation. It can be used to create unique characters for animated sequences and it can generate face animations with audio recordings. Here’s my first attempt using D-ID: