Anthropic Claude Models
Claude Opus 4 and Claude Sonnet 4.
These are multimodal models. They can take as input text, images, PDF files, or office documents. Their output is always text.
They understand text in images, analyze charts and diagrams, and recognize schematics and screenshots. You can upload images in JPEG, PNG, GIF, or WebP format; documents in DOCX, TXT, HTML, ODT, RTF, EPUB; spreadsheets in XLSX or CSV; and PDF files up to 100 pages.
Important limitation: file size must not exceed 30 megabytes. PDFs longer than 100 pages are processed for text only — visual elements are ignored.
Claude models do not work with ZIP and RAR archives, nor with Apple Pages, Numbers, and Keynote files.
Video generation models — RunwayML
Gen4 Turbo.
This model takes an image and a text prompt as input. They work together: the image serves as the first frame, and the text prompt describes the desired motion. The output is a video lasting 5 or 10 seconds in MP4 format.
Gen4 Turbo is an accelerated version for rapid prototyping.
Video generation models — Veo
Veo 3.1 Fast, Veo 3.1, and Veo 3.
All Veo models can generate video in two ways: from a text-only prompt, or from an image plus a text prompt. The output is video with embedded sound, including dialogue, sound effects, and music.
Veo 3.1 Fast — accelerated version for fast generation. Resolution up to 1080p, duration 4, 6, or 8 seconds.
Veo 3.1 — flagship version with support for 4K resolution, improved motion physics, and the ability to specify start and end frames.
Veo 3 — previous version. Veo 2 and Veo 3 models will be fully discontinued on June 30, 2026. It is recommended to use Veo 3.1.
Video generation models — Gen3A Turbo
Gen3A Turbo.
This model takes an image and a text prompt as input. Both are required. A text prompt without an image is not accepted. The output is video.
On-screen hints
Next to each model name, a hint indicates the processing format:
Model name — what it takes as input — what it outputs.
Hints show whether you can attach an image to the model, whether text alone is sufficient, or whether both image and text are required together.
How to choose a model for your query or prompt
If you have text, an image, a PDF, or a document and need a text response → choose Claude Opus 4 or Claude Sonnet 4.
If you have an image and a text description of desired motion and need a video → choose Gen4 Turbo.
If you have text only and need a video based on that description → choose any Veo model with the hint Text → Video.
If you have an image and need a video based on it → choose any Veo model with the hint Image + Text → Video.
If you have an image and need a fast video without complex settings → choose Gen3A Turbo.
Note: If you upload an image to Claude, the model will describe it in text, not create a video. For video output, choose Veo or RunwayML.
File formats
For Claude: images — JPEG, PNG, GIF, WebP; documents — DOCX, TXT, HTML, ODT, RTF, EPUB; spreadsheets — XLSX, CSV; PDF up to 100 pages. File size no larger than 30 MB.
For video generation via Gen4 Turbo, Gen3A Turbo, or Veo: input image in PNG or JPG format. Text prompt is written in words.
Output in all cases: video in MP4 format.
Video duration: for Gen4 Turbo — 5 or 10 seconds; for Veo — up to 8 seconds.
Any uploaded file must not exceed 50 MB.