How to activate ChatGPT Vision Mode on your iPhone

An X user shows us how he got the iPhone to act alongside ChatGPT as an improved assistant

First of all, I would like to say that this entire thread is thank you. to the user of X Álvaro tapes, a specialist in artificial intelligence, cybersecurity and technology and also has a Doctor in Computer Science and engineering... Almost nothing. All the credit is yours alone.

In the following Twitter thread—excuse me if I don't call it X—The user @dr_cintas, which I highly recommend you follow, tells us how You have managed to use GPT-4o Vision Mode on your iPhone through the “Shortcuts” app.

As we saw in Introducing the new ChatGPT language model, this LLM or Large Language Model will be able to identify what it sees through the camera in real time, giving a description of what it sees.

It is not something new, since we have seen on this website How the Rabbit r1 Vision Mode works. In this case, while GPT-4o deploys all its functionalities Let's see how to use vision mode based on shortcuts.

Turn your iPhone into the best assistant with ChatGPT

First of all, I leave you the original thread:

GPT-4o is my new Siri.

I programmed an iPhone Shortcut that when I tap twice, automatically explains what I'm seeing real-time using GPT-4o 🤯 pic.twitter.com/cANNxWWzAn
— Alvaro Cintas (@dr_cintas) May 27, 2024

How it works and steps to follow

It's really simple as well as ingenious. As seen in the video, the user opens his camera, which serves as "eyes" so that Double-tapping the screen activates a shortcut that allows you to “capture the screen” so that ChatGPT tells us what it sees.

Have your iPhone and the API

First of all you have to have an iPhone —but you already knew that— and an OpenAI API Key that you can get it in the following link. Also make sure you have credits to be able to carry out the entire process.

Build the Shortcut

Open the application Shortcuts on your iPhone, press the "+" button and click "Add Action« to generate a new shortcut that triggers the action we want to execute.

To build the new shortcut we will need two lines of text:

1- On the one hand, the prompt that we are going to use, which will be the following:

Provide a summary of the main topic in the screenshot and discuss its importance, interest, or humor. Limit your answer to 3 concise points using • for each point. Focus on the content, not the user interface, and include all relevant information.

2- The line of text that contains the OpenAI API Key that you obtained in the previous step.

Next we are going to add the API in the command «Get content from URL» and you are going to mark the following items as shown in the image. Make sure it stays this way.

We're almost done. Finally, add the parameters “get value from dictionary” and “show” the image generated description. It's almost ready. We have the final step left.

Add double tap

To activate this shortcut with a double tap, do the following: Go to Setting -> Accessibility -> Tap -> Tap Back -> Double touch and add your shortcut so that when you double tap, it takes the screenshot and analyzes what you are seeing.

And that's it, this way you will have the Double Tap option activated so that ChatGPT gives you a detailed description of what it is seeing through the camera, something that will reach devices in the future but it is not yet known when.

How to activate ChatGPT Vision Mode on your iPhone

The first news about the Rabbit R1's Teach Mode arrives.

The Rabbit R1 gets an update: Beta Rabbit, timers, and alarms

Cybersecurity experts put the Rabbit R1 to the test