As AI-generated art continues to grow, more and more critics are accusing AI art tools of stealing the works of living artists. The uneducated or uninitiated hear “stealing” or “theft” and are driving the backlash forward. But someone has to be really, obviously intentional to steal someone else’s image. That just isn’t how the AI works. It begins with data (or metadata) and how the systems process this data.
Big data refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around for a long time. The main difference between big data and metadata is that big data is a massive amount of data that cannot be stored and managed by traditional data handling mechanisms while metadata contains informative and relevant descriptions about data. Let’s refer to these descriptions as labels.
Everything you post online is labeled.
Natural language processing programs (NLPs) process all of this the data to make new images. This includes Midjourney, Stable Diffusion, and Dall E. These systems access lots of different labels: Labels nested in labels reused all over the place; a network of labels. Existing images on the Web are used to train the NLP/AI models but it’s not copying specific works to generate images based the labels, it simply approximates images or parts of images based on labels.
So this is why you can not individually attribute an image to an artist. You could potentially to all artists link of the labels but this is akin to paying every amateur or artist that took a picture of a tomato? This would mean that everyone should get an amazingly small portion, how do you keep up with that, what if the label changes, not practical. — Simon Maynard
So how does this NLP + Art thing work?
All NLPs start with tokenization, which breaks down words into smaller parts based on their meaning and context. This process takes words and sentences apart, allowing software to recognize them by name. Once the words are identified, tokenization separates them into words and phrases based on their semantic meaning.
For example, I love butterflies. I added “butterflies” in one of my Midjourney text prompts and got this result:
The system approximates “butterflies” in my image. In fact, everything in this image is an approximation. Sometimes this works well as with butterflies and sometimes it doesn’t (ex. hands, fingers, feet). Where did Midjourney get the butterflies? Well, how many images online are labeled as such? Thousands… millions?
But what if I upload M.C. Escher’s painting of butterflies?
Well, we can see how the system sampled visual information from M.C. Escher’s work but is it a copy? No. The system is still approximating butterflies. Certain elements (ex. black and white, monchromatic colors) are taken from Escher’s painting but it’s not the same as this:
Approximation is very much like sampling in rap music. Some sampling is done well and I appreciate what I hear. Other attempts at sampling are not very creative and there are, arguably, more of this kind than the good stuff. Personally, I prefer the image without the Escher reference. This is not to say that some people aren’t intentionally trying to copy from other artists using AI. But those people would do that no matter what the medium is and sure AI makes it easier to do but the results are as unoriginal as the creators are. It comes with the territory.
Do we throw the entire thing out because of a few unoriginal users?
I think not.