Forge - GPU Kernel Optimization

Created By
RightNow-AI4 months ago
Turn slow PyTorch into fast CUDA/Triton kernels. 32 parallel swarm agents optimize your code on real datacenter GPUs (B200, H200, H100, A100) with up to 14x speedup over torch.compile.
Overview

Forge MCP Server

Swarm agents that turn slow PyTorch into fast CUDA/Triton kernels, from any AI coding agent.

What it does

  • Optimize existing kernels - Submit PyTorch code, get back an optimized Triton/CUDA kernel
  • Generate new kernels - Describe an operation, get a production-ready optimized kernel
  • 32 parallel swarm agents - Coder+Judge pairs compete to find the fastest kernel
  • Real GPU benchmarking - Every kernel is tested on datacenter hardware (B200, H200, H100, A100, L40S, T4)
  • Up to 14x faster than torch.compile(mode='max-autotune')
  • One-click auth - Browser-based OAuth, no API keys needed

Quick Start

claude mcp add forge-mcp -- npx -y @rightnow/forge-mcp-server

Tools
┌────────────────┬───────────────────────────────────────────────┐
│      Tool      │                  Description                  │
├────────────────┼───────────────────────────────────────────────┤
│ forge_auth     │ Authenticate with Forge via browser           │
├────────────────┼───────────────────────────────────────────────┤
│ forge_optimize │ Optimize PyTorch code into fast GPU kernels   │
├────────────────┼───────────────────────────────────────────────┤
│ forge_generate │ Generate optimized kernels from a description │
├────────────────┼───────────────────────────────────────────────┤
│ forge_credits  │ Check credit balance                          │
├────────────────┼───────────────────────────────────────────────┤
│ forge_status   │ Check job status                              │
├────────────────┼───────────────────────────────────────────────┤
│ forge_cancel   │ Cancel a running job                          │
├────────────────┼───────────────────────────────────────────────┤
│ forge_sessions │ List past optimization sessions               │
└────────────────┴───────────────────────────────────────────────┘


Pricing

Pay-as-you-go. $15/credit, 25% off at 10+. Free trial included - optimize 1 kernel, no credit card required.

https://rightnowai.co - https://github.com/RightNow-AI/forge-mcp-server - https://www.npmjs.com/package/@rightnow/forge-mcp-server

Server Config

{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": [
        "-y",
        "@rightnow/forge-mcp-server"
      ]
    }
  }
}
Project Info
Created At
4 months ago
Updated At
4 months ago
Author Name
RightNow-AI
Star
-
Language
-
License
-
Category

Recommend Servers

View All
//beforeyouship — LLM Cost Modeling From Your Editor
@Indiegoing

Query realistic LLM cost models without leaving your editor. beforeyouship models the **true monthly cost** of an LLM app architecture — retries, prompt caching, batch discounts, infra overhead, and 3×/10× growth — across GPT-5.x, Claude, Gemini, DeepSeek, and more. Not a token calculator: a planning tool for the design phase, before you commit to a stack. **No API key needed to try it** — demo mode covers the six free-tier models. A Pro key from [beforeyouship.dev](https://beforeyouship.dev) unlocks the full 18-model catalog. ## What you can ask - "How much will a RAG chatbot cost at 10,000 requests/day?" - "Compare Claude Haiku vs Gemini Flash pricing for my workload" - "What's the cheapest model for a multi-step agent at scale?" - "Show me current per-token prices for Anthropic models" ## Tools ### `estimate_cost` Full cost model for an architecture at a given usage level. Returns Naive / Realistic / Worst Case monthly cost per model, 3×/10× growth scenarios, and an opinionated recommendation with reasoning. ### `get_model_prices` Current per-1M-token pricing — input, output, cached input, batch — with context windows and staleness metadata. ### `list_archetypes` Seven preset architecture patterns (simple chatbot, chatbot with history, RAG pipeline, multi-model router, coding assistant, document processor, multi-step agent) used as starting points for estimates. ## Setup **Claude Code:** ​```bash claude mcp add --transport http beforeyouship https://beforeyouship.dev/api/mcp ​``` **Cursor / other clients** — add a remote server: ​```json { "mcpServers": { "beforeyouship": { "type": "streamable-http", "url": "https://beforeyouship.dev/api/mcp" } } } ​``` Add an `Authorization: Bearer bys_...` header with a Pro key for the full catalog. ## Try it > Estimate the monthly cost of a RAG pipeline at 10,000 requests/day

7 hours ago
Shippo
@Shippo

15 hours ago
Tavily Mcp
@tavily-ai

JavaScript
a year ago
Mnemom

8 hours ago