Image Description

Surya Saketh E -

Over the past week, I’ve been working on the code that implements an AI vision API(Application Programming Interface), and now, I can essentially pass an image into my Python IntelliJ code and get a description for the image. For example, here is the c0de:

import base64
import requests

# OpenAI API Key
api_key = "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"

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "/Users/saketherramilli/Downloads/IMG_0326.PNG"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What’s in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())



If I pass a screenshot of a piece of sheet music from my band class as a random test sample, here’s what the code outputs:

‘The image shows a screenshot of a webpage displaying sheet music for major scales written for Bb trumpet. These are the scales listed:\n\n1. C Major Scale (“Bb” Concert Major)\n2. F Major Scale (“Eb” Concert Major)\n3. Bb Major Scale (“Ab” Concert Major)\n4. Eb Major Scale (“Db” Concert Major)\n5. Ab Major Scale (“Gb” Concert Major)\n6. Db Major Scale (“B” Concert Major)\n7. F# Major Scale (“E” Concert Major)\n8. B Major Scale (“A” Concert Major)\n9. E Major Scale (“D” Concert Major)\n10. A Major Scale (“G” Concert Major)\n11. D Major Scale (“C” Concert Major)\n12. G Major Scale (“F” Concert Major)\n\nEach scale is notated with music staff, clef, key signature, and notes. There are no clef signs visible in the image, but typically for trumpet, the music would be written in treble clef. The “Concert Major” note next to each scale name indicates the scale pitch as it would sound in concert pitch, which is important for transposing instruments like the trumpet. The website address “psstrings.com” suggests that the content relates to string instruments or possibly a retailer or educational resource associated with music. The top of the image also indicates signal strength, battery level, and time, typical of a smartphone status bar.’

 

The next step would be to pass a theme prompt for the API to interpret along with a series of flower images to see its accuracy in analyzing emotion from drawings.

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

Leave a Reply

Your email address will not be published. Required fields are marked *