A few days ago I was revisiting a text prompt in Midjourney that was inspired by blues musician and songwriter Robert Johnson, whose singing, guitar playing, and songwriting has influenced generations of musicians. The tool offered up four variations, including a startling image of a blue-skin, guitar-playing man-ape. I was so upset that I immediately deleted it.
It was the first time I experienced what scholars refer to as algorithmic injustice.
[R]esearch on algorithmic injustice shows how ML automates and perpetuates historical, often unjust and discriminatory, patterns. The negative consequences of algorithmic systems, especially on marginalized communities, have spurred work on algorithmic fairness. — Birhane 2021
Another paper titled, Diversity is Not a One-Way Street (2023) offers a similar thesis on text-to-image generation models that reflect underlying societal biases in their training data. The authors looked at visually stereotypical output from three widely-used models: DALL-E 2, Midjourney, and Stable Diffusion. Midjourney is my preferred tool to use, so I was immediately intrigued and wanted to test their theories for myself.
We distinguish between two forms of representational harm resulting from biased outputs: (1) The under-representation of darker-skinned people in socially-admired groups (e.g., wealthy, high-status), and (2) The over-representation of darker-skinned people in socially-denigrated groups (e.g., criminal, low-status). — Fraser et al. 2023
The authors of the second paper (Fraser et al. 2023) also looked at the effectiveness of an “ethical intervention” strategy designed to “promote diversity in the output, by essentially ‘reminding’ the system that the given prompt can apply to all people, regardless of skin color.” They found this strategy was only effective in one direction:
[I]t can improve the representation of darker-skinned people for socially positive prompts, but it does not reduce their representation in response to socially negative prompts.
I found this to be true when I tried using the “portrait of a felon” prompt in three different text-2-image generators (see top image), with the exception of Midjourney. Dall-E 2 gave me three monkeys/animals along with one image of a Black man for the “felon” prompt. It is also important to note the exclusion of women across all of the tools. The authors note that further work is needed to develop effective methods to promote equity, diversity, and inclusion in the output of image generation systems. With this in mind I attempted my own interventions.
I made sure to only include their names (not their occupations) as prompts to show that referencing actual people can avoid the linguistic, racial blindspots in GenAI tools and systems. Also, I use Adobe Photoshop to adjust and revise portraits to be sure they look more like the people I’m referencing in the text prompts (and to make sure the hands look more realistic). Users can also upload image prompts to guide the process.
I often don’t need to rely solely on generative AI tools to get the results I’m looking for. I can rely on my color knowledge or Photoshop skills. I’m also not locked into one artistic style and I appreciate the variety of styles I’ve been able to call up from the learning machines (training models). Although I am disheartened by what I’ve seen from these tools, I have been able to circumvent bias through my own input and modifications.