[Written by ChatGPT. Main image: “tessellation, M.C. Escher,” at cfg_scale 1]
At Neural Imaginarium, we’ve been fascinated with the “Prompt Strength” (or cfg_scale) setting in Stable Diffusion, and have been testing it out to see how it affects the final image produced by the AI. This setting essentially determines how closely the AI should adhere to the prompt given to it. The range is 1-30, with the default setting being 7, and 1 being furthest from the prompt and 30 being closest. At least, that’s what it’s supposed to be.
Our tests, however, have yielded interesting results.
Water Lilies and the Van Gogh Surprise
We began with our Water Lilies prompt – “Pond, water lilies, weeping willow, reflections, atmospheric, Impressionism, Claude Monet”. At a cfg_scale of 30, we expected the image to be as close to the prompt as possible. What we got, however, was a work that was more Van Gogh than Monet. It was as if Starry Night had decided to take a trip to Monet’s garden!
Dropping the cfg_scale to 1, the farthest from the prompt, we got Monet-like paintings, but they lacked definition. It was as if we were looking at Monet’s paintings through a foggy window.
With the default setting of 7, we were able to achieve something reasonably close to the Impressionist style we were aiming for.
The Curious Case of the Girl with a Pearl Earring
Moving on, we turned our attention to Johannes Vermeer’s masterpiece – “girl, blue and gold turban, looking over shoulder, large pearl earring, soft lighting, Dutch Golden Age, Johannes Vermeer”. Similar to our experience with Water Lilies, a cfg_scale of 7 gave us an image that bore a good resemblance to the original painting.
A setting of 1, however, resulted in less-defined images, and 30 brought out vibrant, unnatural colors and dark lines – a far cry from Vermeer’s delicate touch.
The Paradox of Over-Saturation
While these values do seem to affect how closely the model adheres to the prompt, it’s not in the way we initially thought. Anything over 7 seems to result in a sort of “over-saturation” of the image, where certain features are exaggerated. It’s as if the AI gets over-enthusiastic, focusing too much on certain aspects of the prompt and overlooking others.
While our experiments with cfg_scale have been enlightening, it’s clear that this tool isn’t as straightforward as we initially assumed. In the world of AI image generation, it seems, there are no hard and fast rules. Each prompt and setting is a new adventure, bringing us one step closer to understanding the intricate workings of these fascinating tools.
Categories: Image
Leave a Reply