Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
{ "source": "/auth/agent/programmatic", "destination": "/auth/programmatic" },
{ "source": "/auth/agent/faq", "destination": "/auth/faq" },
{ "source": "/browsers/hardware-acceleration", "destination": "/browsers/gpu-acceleration" },
{ "source": "/integrations/computer-use", "destination": "/integrations/computer-use/overview" },
{ "source": "/browsers/create-a-browser", "destination": "/introduction/create" },
{ "source": "/introduction", "destination": "/" },
{ "source": "/quickstart", "destination": "/" },
Expand Down Expand Up @@ -206,6 +207,7 @@
{
"group": "Computer Use",
"pages": [
"integrations/computer-use/overview",
"integrations/computer-use/anthropic",
"integrations/computer-use/gemini",
"integrations/computer-use/openagi",
Expand Down
23 changes: 23 additions & 0 deletions integrations/computer-use/anthropic.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,29 @@ Choose `TypeScript` or `Python` as the programming language.

Then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to deploy and run your Computer Use automation on Kernel's infrastructure.

## Build your own agent

For full control over the loop, drive Claude from TypeScript with [`@onkernel/cua-agent`](/integrations/computer-use/overview#build-your-own-agent):

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "anthropic:claude-opus-4-7",
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

## Benefits of using Kernel with Computer Use

- **No local browser management**: Run Computer Use automations without installing or maintaining browsers locally
Expand Down
25 changes: 24 additions & 1 deletion integrations/computer-use/gemini.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Gemini"
---

[Gemini 2.5 Computer Use](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) is Google's groundbreaking capability that enables AI models to interact with computers the way humans doby looking at screens, moving cursors, clicking buttons, and typing text. This powerful feature allows AI agents to control web browsers, navigate interfaces, and perform complex tasks across applications.
[Gemini 2.5 Computer Use](https://blog.google/technology/google-deepmind/gemini-computer-use-model/) is Google's groundbreaking capability that enables AI models to interact with computers the way humans do by looking at screens, moving cursors, clicking buttons, and typing text. This powerful feature allows AI agents to control web browsers, navigate interfaces, and perform complex tasks across applications.

By integrating Gemini 2.5 Computer Use with Kernel, you can run these AI-powered browser automations on cloud-hosted infrastructure, eliminating the need for local browser management and enabling scalable, reliable AI agents.

Expand All @@ -16,6 +16,29 @@ kernel create --name my-computer-use-app --language typescript --template gemini

Then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to deploy and run your Computer Use automation on Kernel's infrastructure.

## Build your own agent

For full control over the loop, drive Gemini from TypeScript with [`@onkernel/cua-agent`](/integrations/computer-use/overview#build-your-own-agent):

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "google:gemini-3-flash-preview",
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

## Benefits of using Kernel with Computer Use

- **No local browser management**: Run Computer Use automations without installing or maintaining browsers locally
Expand Down
25 changes: 24 additions & 1 deletion integrations/computer-use/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "OpenAI"
---

[Computer Use](https://openai.com/index/computer-using-agent/) is OpenAI's feature that enables AI models to interact with computers the way humans doby looking at screens, moving cursors, clicking buttons, and typing text. This powerful feature allows AI agents to control web browsers, navigate interfaces, and perform complex tasks across applications.
[Computer Use](https://openai.com/index/computer-using-agent/) is OpenAI's feature that enables AI models to interact with computers the way humans do by looking at screens, moving cursors, clicking buttons, and typing text. This powerful feature allows AI agents to control web browsers, navigate interfaces, and perform complex tasks across applications.

By integrating Computer Use with Kernel, you can run these AI-powered browser automations on cloud-hosted infrastructure, eliminating the need for local browser management and enabling scalable, reliable AI agents.

Expand All @@ -18,6 +18,29 @@ Choose `TypeScript` or `Python` as the programming language.

Then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to deploy and run your Computer Use automation on Kernel's infrastructure.

## Build your own agent

For full control over the loop, drive OpenAI's CUA from TypeScript with [`@onkernel/cua-agent`](/integrations/computer-use/overview#build-your-own-agent):

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "openai:gpt-5.5",
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

## Benefits of using Kernel with Computer Use

- **No local browser management**: Run Computer Use automations without installing or maintaining browsers locally
Expand Down
110 changes: 110 additions & 0 deletions integrations/computer-use/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: "Overview"
description: "Run computer use agents on Kernel cloud browsers"
---

Computer use models are vision-language models (VLMs) that operate a browser the way a person does: they look at a screenshot, decide what to do next, and emit a concrete action: move the mouse, click, type, scroll, or drag. Kernel runs these agents on cloud browsers, so you don't install or maintain anything locally, and gives the model the low-level [Computer Controls API](/browsers/computer-controls) it needs to see the screen and act on it.

## How computer use works on Kernel

Every computer use integration runs the same action-observation loop:

1. **Capture** a screenshot of the current browser state with the [Computer Controls API](/browsers/computer-controls#take-screenshots).
2. **Predict** the next action by sending that screenshot to your model.
3. **Execute** the returned action (click, type, scroll, drag, or key press) through Computer Controls.
4. **Repeat** until the task is complete.

Computer Controls emulates native keyboard and mouse input at the OS level (with human-like [Bézier curves](/browsers/computer-controls#move-the-mouse) by default) instead of driving the page over the Chrome DevTools Protocol (CDP). This keeps the loop close to real user input and reduces the automation signals that [bot detection](/browsers/bot-detection/overview) systems look for.

The loop works with any VLM that predicts actions from pixels. The models below are the ones we maintain ready-to-deploy templates and guides for.

## Supported models

<CardGroup cols={2}>
<Card title="Anthropic" icon="robot" href="/integrations/computer-use/anthropic">
Claude's computer use tool
</Card>
<Card title="Gemini" icon="google" href="/integrations/computer-use/gemini">
Google's Gemini 2.5 Computer Use model
</Card>
<Card title="OpenAGI" icon="wand-magic-sparkles" href="/integrations/computer-use/openagi">
OpenAGI's Lux model
</Card>
<Card title="OpenAI" icon="circle-nodes" href="/integrations/computer-use/openai">
OpenAI's computer-using agent (CUA)
</Card>
<Card title="Tzafon" icon="bolt" href="/integrations/computer-use/tzafon">
Tzafon's Northstar CUA Fast model
</Card>
<Card title="Yutori" icon="location-arrow" href="/integrations/computer-use/yutori">
Yutori's Navigator n1.5 pixels-to-actions model
</Card>
</CardGroup>

Using a model that isn't listed here? Any VLM works; wire its predicted actions straight to the [Computer Controls API](/browsers/computer-controls) and run the same loop.

## Get started

Each model page includes a one-command template so you can deploy a working agent in minutes. For example, to scaffold the Anthropic integration:

```bash
kernel create --name my-computer-use-app --template computer-use
```

Pick a model above to get its template, then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to run your agent on Kernel.

## Build your own agent

For full control over the loop, [`@onkernel/cua-agent`](https://github.com/kernel/cua/tree/main/packages/agent) is a TypeScript library that runs it against a Kernel browser for you. You point it at a model, give it a task, and it handles the screenshots, actions, and follow-up turns.

```bash
npm install @onkernel/cua-agent @onkernel/cua-ai @onkernel/sdk
```

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "anthropic:claude-opus-4-7", // swap to target another provider
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

Switch providers by changing the `model` ref:

| Provider | Model ref |
| --- | --- |
| Anthropic | `anthropic:claude-opus-4-7` |
| OpenAI | `openai:gpt-5.5` |
| Gemini | `google:gemini-3-flash-preview` |
| Tzafon | `tzafon:tzafon.northstar-cua-fast` |
| Yutori | `yutori:n1.5-latest` |

Set the matching provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `TZAFON_API_KEY`, or `YUTORI_API_KEY`) alongside `KERNEL_API_KEY`.

## Benefits of using Kernel for computer use

- **No local browser management**: Run computer use automations without installing or maintaining browsers locally
- **Scalability**: Launch multiple browser sessions in parallel for concurrent AI agents
- **Stealth mode**: Built-in anti-detection features for reliable web interactions
- **Session state**: Maintain browser state across runs via [Profiles](/auth/profiles)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiles linked instead of Managed Auth

Low Severity

The new Session state benefit points readers to Profiles for maintaining browser state across runs. Integration benefits that cover session persistence or authenticated browsing are expected to highlight Managed Auth (e.g. /auth/overview) as the primary path, not Profiles alone.

Fix in Cursor Fix in Web

Triggered by learned rule: Integration pages should highlight Managed Auth for authenticated browsing

Reviewed by Cursor Bugbot for commit fe1b639. Configure here.

- **Live view**: Debug your agents with real-time browser viewing
- **Cloud infrastructure**: Run computationally intensive AI agents without local resource constraints

## Next steps

- Read the [Computer Controls API](/browsers/computer-controls) reference for the full set of mouse, keyboard, and screenshot actions
- Check out [live view](/browsers/live-view) for debugging your automations
- Learn about [stealth mode](/browsers/bot-detection/stealth) for avoiding detection
- Learn how to properly [terminate browser sessions](/browsers/termination)
- Learn how to [deploy](/apps/deploy) your computer use app to Kernel
23 changes: 23 additions & 0 deletions integrations/computer-use/tzafon.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,29 @@ Choose `TypeScript` or `Python` as the programming language.

Then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to deploy and run your Tzafon automation on Kernel's infrastructure.

## Build your own agent

For full control over the loop, drive Northstar CUA Fast from TypeScript with [`@onkernel/cua-agent`](/integrations/computer-use/overview#build-your-own-agent):

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "tzafon:tzafon.northstar-cua-fast",
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

## Benefits of using Kernel with Tzafon Northstar CUA Fast

- **No local browser management**: Run Northstar CUA Fast automations without installing or maintaining browsers locally
Expand Down
23 changes: 23 additions & 0 deletions integrations/computer-use/yutori.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,29 @@ Choose `TypeScript` or `Python` as the programming language.

Then follow the [deploy](/apps/deploy) and [invoke](/apps/invoke) guides to deploy and run your Yutori automation on Kernel's infrastructure.

## Build your own agent

For full control over the loop, drive Navigator n1.5 from TypeScript with [`@onkernel/cua-agent`](/integrations/computer-use/overview#build-your-own-agent):

```ts
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
browser,
client,
initialState: {
model: "yutori:n1.5-latest",
systemPrompt: "You are a careful browser automation agent.",
},
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
```

## Benefits of using Kernel with Yutori n1.5

- **No local browser management**: Run n1.5 automations without installing or maintaining browsers locally
Expand Down
Loading