Write Me AI Sonnet

Using AI to create a spoken sonnet based on its evaluation and description of a single image.


Neil Pullman


AI & Machine Learning models, CLIP/BLIP, LLM (OpenAI GPT 4), Voice synthesis

In this experiment, we demonstrate the seamless integration of artificial intelligence models to create a captivating experience. The process begins with a CLIP model analyzing an uploaded photo, extracting key elements and features. To dive deeper into the image’s details, a language model generates specific questions based on the initial analysis, such as descriptions of the subject, attire, and background.

Next, a BLIP model is employed to answer the questions gathered in the previous step, providing a comprehensive understanding of the photo. A GPT model then takes these answers and crafts a sonnet, incorporating the image’s details and additional prompts. Finally, the text sonnet is transformed into an auditory experience through a voice synthesis model, streaming the captivating voice response back to the user. This experiment showcases the power of innovative technology in creating engaging and immersive content.