Image-enabled chat models, also known as large multimodal models (LMMs), are AI models that analyze images and provide textual responses to questions about them.
These models possess both natural language processing and visual understanding capabilities, allowing them to process both textual and visual inputs and offer multimodality.
Not every large language model has multimodal capabilities, meaning not all of them can analyze image inputs.
Image Enabled Models by TextCortex
With TextCortex, you can access image-enabled models with powerful capabilities. To find out which models support image input, follow these steps:
- Open the TextCortex Web App
- Click on “Tools” within the chat interface
- Move the cursor to the Model section at the bottom
- Check the “Multimodal” section by hovering the cursor over the models
If the Multimodal section has the “Yes” indicator, it means the model supports image input; if it has the “No” indicator, it does not support image input.
Here is a list of image enabled large language models supported by TextCortex:
- Claude 4.5 Haiku
- Gemini 2.5 Flash
- Gemini 2.0 Flash
- GPT-4o Mini
- Claude 4.6 Sonnet
- Kimi K2.5
- Claude 4.5 Sonnet
- Grok 4
- Claude 4 Sonnet
- GPT-4.1
- GPT-4o
- GPT-5.2
- Kimi K2.5 Thinking
- Gemini 3 Pro
- Gemini 3 Flash
- GPT-5.1
- GPT-5 Mini
- GPT-5
- Claude 4 Sonnet Thinking
- Gemini 2.5 Flash Thinking
- Gemini 2.5 Pro
How to Use Image Enabled Chat Models
To use image-enabled models on TextCortex, simply select your desired model within the chat interface.
Afterwards, you can upload your images using drag-and-drop, by clicking on the paperclip icon next to the chatbox or by copy-pasting the URL of online images into the chat.
Use Cases & Examples
Example #1
Explaining images:
Example #2
Data Extraction:
Example #3
Creating Presentations:
Example #4
Image to Text:
Example #5
Prompt Writing: