I Tried to Construct Picture Captioning App With OpenAI Codex CLI

OpenAI Codex CLI is an open‑supply command-line instrument that brings the ability of OpenAI’s newest reasoning fashions on to your terminal. Consider it as a light-weight AI coding assistant that lives in your shell: it could learn your code, modify recordsdata, and even execute instructions in your mission surroundings. This implies you’ll be able to ask it to construct options, repair bugs, or clarify unfamiliar code with out leaving your improvement workflow. In brief, it’s chat-driven improvement – you work together with Codex in pure language and it responds with code edits or command outcomes, successfully providing you with ChatGPT-level reasoning plus the flexibility to run code and see outcomes in real-time​. Sounds intriguing, proper? Additional on this article, I’ll inform you about how you can entry it and use on your queries.

Key Options of OpenAI Codex CLI

OpenAI Codex CLI comes with a number of highly effective options that make it a useful companion for builders. One of many greatest benefits of Codex CLI is that it runs totally in your native machine. Your supply code and recordsdata keep in your surroundings and aren’t uploaded wholesale to a cloud service​. Solely your prompts and high-level context (like summarized diffs or related snippets) are despatched to the OpenAI API for producing responses​. As a result of the CLI is open-source and works regionally, it offers you privateness and management by design – your workflow and code stay non-public. This makes Codex CLI particularly interesting for codebases you could’t or don’t need to share, whereas nonetheless leveraging highly effective AI help.

By integrating immediately into the terminal, Codex CLI suits naturally right into a developer’s day-to-day work. You’ll be able to chat with the AI assistant proper subsequent to your git instructions, textual content editor, and construct instruments, which implies much less context-switching in comparison with utilizing a separate chat interface. The instrument is designed for fast iteration: ask a query or give an instruction, let it suggest or apply a change, run the code, and repeat – multi function place​.

Listed here are a few of the highlights:

1. Zero-Setup Set up

Codex CLI is extraordinarily straightforward to get working. All you want is Node.js and an OpenAI API key – a single command like npm set up -g @openai/codex installs the CLI globally, with no different setup required​. There’s no complicated configuration or surroundings fiddling; convey your API key and it “simply works”​. (You’ll be able to even replace to the most recent model at any time with a easy codex –improve command​.)

2. Terminal-Native Design

Codex runs totally in your terminal, so it seems like a pure extension of your shell surroundings​. You’ll be able to invoke it out of your mission listing and have it work together together with your native recordsdata and instruments. This terminal-native strategy means you don’t have to change to a browser or GUI – good for sustaining movement and context whereas coding​. The CLI offers an interactive chat-like interface in textual content, so that you see the AI’s responses (like code diffs or command outputs) proper within the console.

3. Multimodal Inputs

Not like plain text-only instruments, Codex CLI accepts multimodal inputs – you’ll be able to cross not simply textual content prompts, but in addition photos resembling screenshots or diagrams to information the assistant​. For instance, you might drag a screenshot of an error message or a UI sketch into the terminal, and Codex can interpret it and act on it. It is a distinctive functionality that lets the AI use visible data to generate or edit code accordingly​. Beneath the hood, it makes use of vision-enabled fashions to know photos, enabling use circumstances like debugging from a screenshot of a stack hint or constructing a structure from a wireframe.

4. Wealthy Approvals Workflow

Codex CLI offers you fine-grained management over what it could do autonomously by way of a wealthy approval system. You’ll be able to select between three modes (Counsel, Auto Edit, Full Auto) that decide whether or not the AI’s proposed code adjustments or instructions are auto-executed or require your affirmation​. This versatile workflow allows you to resolve how hands-on you need to be: you can begin conservatively (handbook approvals for every part) and dial as much as full automation for repetitive duties. We’ll dive deeper into these modes within the subsequent part, however the important thing level is that Codex gained’t make adjustments you’re uncomfortable with – you’re all the time answerable for approvals.

5. Native Execution and Privateness

All code execution and file enhancing occurs on your machine, inside your mission’s surroundings. Apart from the mannequin queries, nothing is shipped out – the CLI doesn’t add your codebase to OpenAI. This implies you keep full privateness. You’ll be able to safely use Codex CLI on proprietary or delicate code figuring out that the instrument isn’t retaining or sharing your knowledge. Even when utilizing essentially the most autonomous mode, Codex runs in a sandboxed surroundings with no community entry, guaranteeing any actions it takes keep native to your system​. In brief, you get the advantages of an AI pair programmer with out giving up privateness or safety.

Codex CLI Modes that You Should Know

Github Hyperlink: openai/codex

A standout function of Codex CLI is its approval workflow – primarily, you resolve how a lot freedom the AI has to make adjustments or run instructions. There are three approval modes: Counsel, Auto Edit, and Full Auto. Every mode strikes a special steadiness between automation and person oversight, so you’ll be able to choose what suits your consolation degree for the duty at hand​. Right here’s an summary of how they work:

1. Counsel Mode (Default)

That is essentially the most conservative mode, superb for whenever you need to fastidiously evaluate every part. The AI can learn your mission recordsdata and recommend code edits or terminal instructions, but it surely gained’t apply adjustments or execute something with out your express approval​ . Basically, Codex will work together with you want an skilled advisor: it would suggest a patch diff for a bug repair or present a shell command to run checks, after which ask on your affirmation. Use Counsel mode for secure exploration – e.g. studying a brand new codebase or doing a code evaluate – the place you need to see suggestions however apply them manually​ .

2. Auto Edit Mode

In Auto Edit, Codex is allowed to mechanically apply code adjustments (it could edit/write to recordsdata by itself) however nonetheless should ask earlier than working any shell instructions​ . This mode is nice for duties like refactoring or making repetitive edits throughout a codebase. You get the effectivity of the AI immediately modifying code for you, whereas retaining a checkpoint of management earlier than any program execution. For instance, Codex may rewrite a operate in a number of recordsdata and save the adjustments instantly, but when it needs to run your take a look at suite or begin the dev server, it can pause and ask on your go-ahead. Auto Edit mode is a steadiness: sooner coding iterations, but you continue to supervise side-effects like instructions​ .

3. Full Auto Mode

Full Auto offers the AI essentially the most autonomy. Codex can learn and write recordsdata and in addition execute shell instructions by itself with out stopping for approval​ . On this mode, it turns into a very automated agent – you might ask it to carry out a fancy job after which sit again whereas it really works by way of the steps. To maintain issues secure, Full Auto runs in a restricted sandbox: all instructions are executed with community entry disabled and scoped to your mission listing (it could’t wander outdoors or entry the web)​. This mode is good for longer duties the place you belief the AI to iterate, as an example, fixing a damaged construct or prototyping a brand new function whilst you take a brief break​ . In fact, it is best to use Full Auto with warning – it’s highly effective, however you’ll need to make sure you’ve backed up or version-controlled your code (the CLI will really warn you when you’re not in a git repo when beginning Auto Edit or Full Auto)​​ .

Comparability of Modes

The variations between the three modes are summarized within the desk beneath, together with typical use circumstances for every:

Mode What the Agent Can Do When to Use (Use Circumstances)
Counsel (default) – Learn any recordsdata in your repo<br/>- Suggest edits and shell instructions (requires your approval to use/execute)​  Secure exploration of codebases, code opinions, studying a brand new mission’s construction the place you need full management over adjustments​ .
Auto Edit – Learn and modify recordsdata (apply edits mechanically)<br/>- Suggest shell instructions (execution nonetheless requires approval)​  Refactoring code or making bulk edits whereas maintaining a tally of unwanted effects. Nice for repetitive adjustments the place handbook file enhancing is tedious however you continue to need to approve any instructions​ .
Full Auto – Learn, write, and execute instructions autonomously (all actions auto-approved)<br/>- Runs in sandbox (no community, confined to mission listing)​  Giant or time-consuming duties like fixing all checks in a damaged construct or scaffolding a brand new app from scratch. Helpful whenever you need to delegate execution totally to the AI (e.g. fast prototyping)​ .

In observe, you’ll be able to choose the mode that is sensible on your state of affairs. By default, when you simply run codex it begins in Counsel mode. To explicitly select a mode, you’ll be able to launch the CLI with a flag: for instance, use –auto-edit or –full-auto to start out in these modes​ . There’s additionally an interactive command (/mode) to toggle modes throughout a session​ . This manner, you may start in Counsel mode to see what Codex plans to do, then swap to Auto Edit when you’re comfy with its strategies, and possibly kick into Full Auto for the ultimate stretch of a job. The essential factor is that you management the extent of autonomy always.

System Necessities for Codex CLI

Earlier than putting in Codex CLI, make sure that your improvement surroundings meets the minimal necessities. The instrument is cross-platform, however at present works greatest on Unix-like methods. Listed here are the minimal and really helpful specs:

Requirement Minimal Really helpful
Working System macOS 12+ or Ubuntu 20.04+/Debian 10+ (Linux); Home windows 11 through WSL2 Newest OS updates (newest macOS or LTS Linux launch; Home windows with newest WSL2) for greatest compatibility.
Node.js 22 (or newer)​ Newest LTS model of Node.js (>= 22) for stability.
Git (elective) 2.23+ (if utilizing model management options)​ Latest Git out there (elective, however really helpful for full performance like PR helpers).
Reminiscence (RAM) 4 GB minimal 8 GB or extra (for smoother efficiency on massive duties)​.

Codex CLI has been examined on macOS and Linux. Home windows customers can run it through WSL2 (Home windows Subsystem for Linux) since native Home windows help remains to be experimental​ . You’ll additionally want an OpenAI API key (out of your OpenAI account) to authenticate the CLI – we’ll cowl that subsequent. Apart from these, no different particular {hardware} is required; when you can run fashionable Node.js, you’re possible good to go.

Be aware: It’s really helpful to have your mission underneath supply management (git) when utilizing Codex CLI, particularly for Auto modes. Whereas Git isn’t strictly required to run the CLI, having model management will can help you simply evaluate adjustments and rollback if wanted. In truth, Codex will remind you with a warning when you attempt to use Auto Edit or Full Auto in a listing that’s not a git repo​

How you can Use OpenAI Codex CLI?

Step 1: Set up Node.js

  1. Obtain Node.js v22+ from nodejs.org.
  2. Set up utilizing default settings.
  3. Confirm set up:
bash
node --version  # Ought to present v22+
npm --version   # Ought to present v10

Step 2: Set up Codex CLI

bash
npm set up -g @openai/codex
  • Troubleshooting: In case you see permission denied errors:
    • Home windows: Run PowerShell as Administrator.
    • Linux/macOS: Use sudo npm set up -g @openai/codex (not really helpful; repair npm permissions as an alternative).

Step 3: Set OpenAI API Key

Terminal

For PowerShell (Home windows):

Powershell

$env:OPENAI_API_KEY = "your-api-key-here"

To make it everlasting:

Powershell

setx OPENAI_API_KEY "your-api-key-here"

For Git Bash/MINGW64:

bash
export OPENAI_API_KEY="your-api-key-here"

To make it everlasting, add to ~/.bash_profile:

bash
nano ~/.bash_profile  # Add "export OPENAI_API_KEY=..."
supply ~/.bash_profile

Step 4: Repair “sh.exe” Errors (Home windows Solely)

  1. Set up Git for Home windows from git-scm.com.
  2. Throughout set up:
    • Choose “Use Git and Unix instruments within the Command Immediate”.
    • Allow “Allow symbolic hyperlinks”.
  3. Restart your terminal.

Step 5: Run Codex

Interactive Mode

Run interactively:

Codex

Arms-on OpenAI Codex CLI to Construct Sport and Picture Captioning APP

Activity 1: Primary Immediate Execution

first task CODEX CLI

I began with a easy job—asking Codex to jot down 2–3 sentences about myself. The CLI responded rapidly and precisely, producing coherent, grammatically sound output in simply seconds. It demonstrated robust immediate understanding and fluency, even with minimal enter.

Activity 2: Picture Captioning App with OpenAI Mannequin

Subsequent, I attempted constructing a extra complicated utility: a picture captioning instrument the place customers add a picture and obtain a descriptive caption generated by an OpenAI mannequin. Whereas Codex supplied a good place to begin, the code was outdated—referencing deprecated code and lacking key parts for file dealing with and mannequin integration. I needed to step in and replace the code myself. (I’ve included a screenshot for reference.) This highlighted a limitation: for newer or less-documented APIs, Codex may fall again on older patterns or incomplete implementations.

Error with Codex CLI

ERROR IN IMAGE CAPTIONING

Activity 3: Tetris Sport with Python and Pygame

Output

For the ultimate job, I requested Codex to construct a Tetris recreation utilizing Python and Pygame. This time, it nailed it. The code was well-structured, absolutely purposeful, and required no main edits. The sport ran easily and included all of the core mechanics—block motion, rotation, line clearing, and scoring. A stable demonstration of Codex’s potential to deal with interactive, graphics-based tasks when working with well-established libraries like Pygame.

Use Circumstances for Codex CLI

Codex CLI can supercharge your improvement workflow throughout a number of frequent duties:

  1. Bug Fixing: While you hit a bug or failing take a look at, use Counsel mode to ask issues like “Why is the login operate throwing an error?” Codex analyzes the code, spots points (like a unsuitable variable or lacking examine), and suggests fixes. You evaluate and approve the patch. For trickier points, Full Auto mode lets Codex repair a number of failures by iteratively working checks and making use of adjustments. You continue to confirm the outcomes, but it surely handles the heavy lifting.
  2. Code Refactoring: Refactoring throughout recordsdata—like switching from callbacks to async/await—will be tedious. In Auto Edit mode, Codex can apply constant adjustments all through your codebase. For instance, say “Refactor the API routes to async/await,” and it’ll deal with the file edits, pausing provided that wanted. You supervise the adjustments through diffs, letting Codex do the grunt work whilst you oversee high quality.
  3. Studying a New Codebase: Simply cloned a repo? Use Counsel mode to ask, “What does the Scheduler class do?” or “How does authentication work?” Codex reads the code and explains in plain language, serving to you navigate unfamiliar tasks rapidly. You’ll be able to request summaries, perceive module obligations, and discover performance with out making adjustments.
  4. Prototyping and Scaffolding: Need to kickstart a brand new mission or function? Full Auto mode can generate code and set every part up. Ask it to “Create a easy TODO internet app in Flask,” and it’ll generate recordsdata, set up dependencies, and run the app—mechanically. For brand new options like “Add CSV export to this CLI instrument,” Codex writes and integrates the code, providing you with a working baseline to construct on.

Codex CLI acts like an AI pair-programmer—serving to with every part from mundane edits to complicated automation. You management how hands-on or autonomous it’s, relying on the duty.

Conclusion

With the OpenAI Codex CLI, builders achieve a pleasant AI accomplice proper within the terminal – one that may cause about code and deal with the mechanics of enhancing and working it. I’ve lined what Codex CLI is and the way it works, from its zero-effort set up to the intelligent approval modes that maintain you in management. You’ve seen how you can get began and run some fundamental instructions, and the way it may also help in real-world use circumstances like fixing bugs, refactoring, studying codebases, and prototyping new concepts. In essence, Codex CLI brings the ChatGPT expertise into your improvement surroundings, turning pure language directions into working code, all whilst you stay in cost. It’s an thrilling instrument that embodies the way forward for AI-assisted software program improvement: quick, versatile, and constructed with developer empowerment in thoughts. Give it a attempt in your subsequent mission!

Hello, I’m Pankaj Singh Negi – Senior Content material Editor | Obsessed with storytelling and crafting compelling narratives that remodel concepts into impactful content material. I like studying about know-how revolutionizing our life-style.

Login to proceed studying and luxuriate in expert-curated content material.