The AiEdge Newsletter

The AiEdge Newsletter

Share this post

The AiEdge Newsletter
The AiEdge Newsletter
How To Build An AI Sports Commentator With The Latest GPT-4 Vision and OpenAI Text-to-Speech
Copy link
Facebook
Email
Notes
More

How To Build An AI Sports Commentator With The Latest GPT-4 Vision and OpenAI Text-to-Speech

Damien Benveniste's avatar
Damien Benveniste
Nov 13, 2023
∙ Paid
11

Share this post

The AiEdge Newsletter
The AiEdge Newsletter
How To Build An AI Sports Commentator With The Latest GPT-4 Vision and OpenAI Text-to-Speech
Copy link
Facebook
Email
Notes
More
Share
  • Asking questions about an image

  • Describing videos

  • Converting the text to speech


Asking questions about an image

Finally, the GPT-4 Vision model is available through the OpenAI API! Many of us may have forgotten that GPT-4 is actually a multi-modal model. It can take text inputs as well as image inputs.

This is mostly so we can ask questions about the image. Let’s play with it! Let’s first make sure we have the right version of the OpenAi Python package:

pip install -U openai

I also set up my OpenAI API key in my environment variables:

import os
os.environ["OPENAI_API_KEY"] = ...

From an URL

I am going to use an image from Google Images at the following URL:

2018 Turing Award

Let’s see if it can describe the image. We use the client.chat.completions.create function:

from openai import OpenAI

client = OpenAI()

prompt = 'Describe the image'
url = 'https://awards.acm.org/binaries/content/gallery/acm/ctas/awards/turing-2018-bengio-hinton-lecun.jpg'

result = client.chat.completions.create(
    model='gpt-4-vision-preview',
    max_tokens=500,
    messages=[{
        'role': 'user',
        'content': [prompt, url]
    }]
)

result.choices[0].message.content

The image features three men standing side by side, each wearing a suit and tie. They are posing for a formal photograph with smiles on their faces. The background is a plain, neutral color. The men are identified as the recipients of the 2018 ACM Turing Award, which is considered to be the "Nobel Prize of Computing." Their names are Bengio, Hinton, and LeCun, and they are recognized for their work in the field of artificial intelligence and deep learning.

Pretty good!

Let’s try something more difficult with the following chart:

prompt = 'What is this chart about?'
url = 'https://www.mongodb.com/docs/charts/images/charts/stacked-bar-chart-reference-small.png'

result = client.chat.completions.create(
    model='gpt-4-vision-preview',
    max_tokens=500,
    messages=[{
        'role': 'user',
        'content': [prompt, url]
    }]
)

result.choices[0].message.content

The chart appears to be a stacked bar chart that represents data in a visual format. It is divided into three different colored segments, which likely represent different categories or variables being measured. The chart has an X-axis with labels (which are not visible in the provided image) and a Y-axis with numerical values, suggesting that the chart is used to compare quantities or frequencies of the categories across different groups or time periods. The exact topic or data represented in the chart is not specified in the image or the provided link.

It doesn’t seem to be able to read the text from the image. Let’s see if it can read with the following image:

Solving Equations - GCSE Maths - Steps, Examples & Worksheet
prompt = 'What is written?'
url = 'https://thirdspacelearning.com/wp-content/uploads/2021/03/Solving-Equations-What-is.png'

result = client.chat.completions.create(
    model='gpt-4-vision-preview',
    max_tokens=500,
    messages=[{
        'role': 'user',
        'content': [prompt, url]
    }]
)

result.choices[0].message.content

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 AiEdge
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More