Trending Tools, Models & APIs — Cihangir Bozdogan
Top 30 · this week
Trending for engineers
Tools, models, APIs & resources gaining traction · infra, AI, backend, devops.
Tools· 12
- 01
OpenAI Codex CLI
agent-cli
OpenAI's official Rust-based coding agent CLI — 350+ stars added this week with the GPT-5.5 release.
Codex's repo is now near the top of the daily-Rust trending list, mirroring OpenAI's push to make Codex a first-class developer surface alongside the GPT-5.5 launch. New automations, scheduled triggers, and a plugins/skills system shipped in the academy docs the same week. Zed-industries also pushed codex-acp, an ACP wrapper letting Zed editors host the Codex agent directly.
- 02
Hermes Agent
agent-framework
NousResearch's open agent framework — 19,000+ stars added this week, the runaway viral repo of the cycle.
Hermes is Nous's bet on "the agent that grows with you" — a long-running, memory-hosted agent runtime built around their Hermes model line but provider-agnostic. Velocity is the story: 19K stars in seven days puts it among the fastest-growing AI repos of the year. Worth watching even if you don't use Nous models, since the harness pieces (memory, tool selection, persistence) are the parts most teams keep rebuilding.
- 03
OpenAI Agents Python
agent-framework
OpenAI's official Python SDK for multi-agent orchestration — 3,300+ stars this week, GA for production use.
openai-agents-python is OpenAI's blessed alternative to LangChain-style frameworks: handoffs, sessions, tool-calling, and built-in tracing into the OpenAI dashboard. The repo's velocity jumped after Codex Automations and the GPT-5.5 launch made the agent path the default in OpenAI's developer story. Lighter than LangGraph and tied to the OpenAI runtime, but supports any LiteLLM-compatible backend.
- 04
claude-context
code-context
Zilliz's MCP server for code-aware agent context — 2,800+ stars this week, plug-in for Claude Code.
claude-context indexes a repo into a vector store and exposes it to Claude Code (and other MCP clients) as a structured retrieval tool, so the agent stops re-reading files it has already seen. Zilliz built it on Milvus, but it works against any compatible vector backend. The traction this week reflects how much agent harnesses are now bottlenecked on retrieval, not raw model quality.
- 05
Sail
data-engine
LakeHQ's Rust-native Spark replacement — drop-in PySpark API, single binary, ~133 stars added today.
Sail is a Rust query engine that speaks the PySpark API and Spark Connect protocol, aimed at teams who want to keep their dataframe code but drop the JVM. It compiles to a single binary, claims 4–10x speedups on common workloads, and integrates with Iceberg, Delta, and Hudi. With Polars dominating single-node and DuckDB on analytical work, Sail is the most interesting attempt yet at the distributed-Spark slot.
- 06
TileLang
kernel-dsl
Python-embedded DSL for writing high-performance GPU kernels — used by DeepSeek's TileKernels release.
TileLang lets you write GPU kernels in tile-based Python and compile them to Hopper/Blackwell-grade CUDA. It jumped this week because DeepSeek published TileKernels, an open library of FlashAttention/MoE kernels written entirely in TileLang, alongside the V4 release. If you've been using Triton, this is worth tracking — TileLang gives you more explicit control over tile layout and pipeline scheduling.
- 07
FlashKDA
cuda-kernels
MoonshotAI's CUDA kernels for Kimi Delta Attention — high-performance attention primitives behind K2.6.
FlashKDA is the kernel implementation of the delta-attention variant Moonshot uses in K2.6. Released as a standalone repo, it lets other model developers reuse the kernels for similar architectures. ~390 stars in a week — small in absolute terms but unusually high for a low-level kernel project.
- 08
Roo Code
agent-ide
Open-source autonomous coding agent — fork of Cline with multi-mode prompting and richer context handling.
Roo Code is one of the more credible open coding agents, sitting between full IDE integrations like Cursor and CLI agents like Claude Code. It runs as a VS Code extension, lets users define mode-specific personas, and added support for the new agentic models this week. Velocity is steady rather than viral — the kind of repo that sticks.
- 09
Spinel
language-runtime
Matz's Ruby AOT native compiler — produces standalone binaries from Ruby source, no MRI dependency.
Spinel is Yukihiro Matsumoto's experimental AOT compiler for Ruby, hitting 328 points and front-page on HN this week. It targets a subset of Ruby that compiles to a native binary without bundling the MRI interpreter. Still early, and not aimed at full Rails apps, but a notable signal about where Ruby's creator wants the language to go.
- 10
Raylib v6.0
graphics-framework
Major release of the C99 graphics library — six years in, with revamped renderer and platform layer.
Raylib 6.0 is the first major version bump since 2020. The release rewrites the platform layer (now backend-pluggable: GLFW, SDL3, Raylib-native) and modernizes the renderer for current GPUs. It remains one of the cleanest C99 codebases to study. Front-paged on HN with strong engagement.
- 11
Open WebUI
llm-frontend
Self-hosted ChatGPT-style frontend for Ollama and OpenAI-compatible APIs — 134K stars and accelerating.
Open WebUI continues to be the default self-hosted UI for local-LLM rigs, and it's added day-one support for Qwen3.6 and DeepSeek V4 served via Ollama. The project still lacks a truly clean MCP story, but for any team standing up a private Claude/ChatGPT alternative, it's the path of least resistance.
- 12
Langflow
agent-builder
Visual builder for agent workflows — Python-first, open-source, 147K stars, IBM-backed.
Langflow is the visual agent-workflow builder that survived the LangChain reshuffle. It still ships as a drop-in Python package, exposes an API for any flow you build, and added native support this week for OpenAI Codex Automations and DeepSeek V4 endpoints. Useful for teams that want a UI for non-engineers without giving up code-level control.
Models· 8
- 01
DeepSeek V4
frontier-llm
Open-weights reasoning and coding model with 1M-token context, near-frontier benchmarks at a fraction of API price.
DeepSeek's V4 release this week is the dominant model story of the cycle. Pro and Flash variants both ship MoE architectures with bitwise-deterministic kernels at temperature 0 — a first for open-weights at this scale. Practitioners on HN and r/LocalLLaMA report it matches or beats Claude/GPT-5 on long-context coding tasks while undercutting them on token cost. The model card and API docs are notably more readable than the equivalents from larger labs.
- 02
GPT-5.5
frontier-llm
OpenAI's flagship update — faster, cheaper, and pushed into the API and Codex on day one.
GPT-5.5 and GPT-5.5 Pro shipped together with a developer changelog and a system card. Early benchmarks have it at 82% on CyberGym, putting it on par with Anthropic's gated Mythos preview but generally available. Rollout in ChatGPT and Codex is staged. Vercel's AI Gateway added it within hours alongside DeepSeek V4.
- 03
Qwen3.6-35B-A3B
moe-llm
Alibaba's MoE flagship — top of the Hugging Face trending board with ~1.5M weekly downloads.
Qwen3.6's 35B-A3B sparse MoE shipped with FP8, GGUF, and NVFP4 variants from the day of release, and within a week it was the most-downloaded image-text-to-text model on Hugging Face. Community KV-cache quantization tests show it holds quality at q4_0, putting flagship-grade behavior within reach of single-GPU rigs. The 27B dense sibling Qwen3.6-27B is the one cited as matching Sonnet 4.6 on planning tasks.
- 04
Kimi K2.6
agent-llm
Moonshot AI's K2 line refresh — agentic coding focus, image-text-to-text, ~290K weekly downloads.
Moonshot's K2.6 update lands as a multimodal agent model with strong tool-calling and code generation. The release pairs with Moonshot's FlashKDA repo (CUDA kernels for Kimi Delta Attention) which made GitHub trending the same week. Vercel added K2.6 to the AI Gateway on April 20.
- 05
Gemma 4 31B-it
open-llm
Google's open-weights workhorse — 31B instruction-tuned variant with 5.7M downloads, broad VLA tooling.
Gemma 4 has had the cleanest momentum of any Google open release in two years. The 31B-it variant tops the all-time-downloads board for the family, and Hugging Face shipped a Gemma 4 VLA demo on the Jetson Orin Nano Super this week — a useful signal that the model is being treated as a robotics/embedded option, not just a chat model. The smaller E4B tier is also widely re-tuned by the community.
- 06
GLM-5.1
open-llm
Zhipu's GLM update — text-generation, ~218K weekly downloads, strong showing on reasoning leaderboards.
GLM-5.1 from zai-org is the next iteration of the GLM line and trended above 1,500 likes on Hugging Face within days. It's gaining adoption on local-LLM rigs as a reasoning alternative to Qwen, and is one of the models reflected in the Qwopus-GLM merged checkpoints showing up in the trending list. Less hype than DeepSeek/Qwen but solid quality at the size.
- 07
MiniMax-M2.7
open-llm
MiniMax's M2 refresh — text-generation focus, ~477K weekly downloads, growing footprint outside China.
MiniMax-M2.7 landed quietly but is now firmly in the top trending tier on Hugging Face. The release continues MiniMax's bet on long-context efficient inference, and several community-distilled and Claude-Opus-reasoning-distilled variants are showing up alongside it. Worth watching if DeepSeek's price pressure forces other Chinese labs to release frontier-class weights.
- 08
Hunyuan Hy3-preview
reasoning-llm
Tencent's 295B A21B reasoning + agent model — preview release, strong cost/perf for its size class.
Hy3-preview is a 295-billion parameter MoE with 21B active. Tencent positions it as a leading reasoning and agent model in its size class, with cost efficiency as the headline pitch. The HF repo and the GitHub mirror both gained traction this week as people tested it against DeepSeek V4 and Qwen 3.6.
APIs & Services· 7
- 01
Vercel AI Gateway
ai-gateway
Unified inference gateway — added GPT-5.5, DeepSeek V4, Kimi K2.6, and GPT Image 2 in five days.
Vercel's AI Gateway shipped four new model integrations between April 20 and 24: Kimi K2.6, GPT Image 2, DeepSeek V4, and GPT-5.5. The product is now a credible OpenRouter alternative for teams already on Vercel, with built-in observability, BYOK fallbacks, and per-request routing. Usage on the gateway has been doubling roughly every quarter since it went GA in August 2025.
- 02
DeepSeek API
model-api
DeepSeek's hosted V4 API — million-token context, deterministic kernels, headline-undercutting pricing.
DeepSeek opened the V4 API the same day the weights dropped. The headline numbers — million-token context, bit-exact determinism at temperature 0 — are unusual at this scale, and the docs are widely praised for being clearer than the equivalents at OpenAI or Google. Combined with the price, it's the API teams running long-context agents are testing this week.
- 03
OpenAI GPT-5.5 API
model-api
GPT-5.5 and GPT-5.5 Pro live in the OpenAI API, including via Codex and the Responses surface.
GPT-5.5 hit the API at launch (unusual for OpenAI), with both standard and Pro tiers, plus immediate Codex integration. The system card and changelog were published the same day, and OpenAI's docs are pushing the Responses API and tool-use surface as the recommended path forward. Anecdotally, latency is meaningfully better than GPT-5 at similar quality.
- 04
Google Cloud TPU v8 (Ironwood)
inference-cloud
Google's eighth-generation TPUs — "two chips for the agentic era", positioned for agent-style inference patterns.
Google announced its eighth-generation TPUs framed explicitly around agentic workloads — long sequential calls, tool use, and stateful sessions rather than one-shot inference. The pitch lands credibly because Gemini 3 already uses 5–10x fewer tokens per response than competing models, and Google attributes that efficiency to the chip pipeline. Available in Google Cloud first, broader Vertex availability rolling out.
- 05
Cloudflare Agents Week 2026
agentic-cloud
Cloudflare's annual Agents Week — durable agent runtimes, KV bindings, sandboxes, and bot/human routing.
Agents Week 2026 collected ~25 launches under the "agentic cloud" banner, covering Workers Agents primitives, sandboxes, and a more aggressive bot-management story ("moving past bots vs. humans"). The most operationally interesting piece was the panic-and-abort recovery system added to Rust Workers via wasm-bindgen, which makes long-running agent code on Workers genuinely production-shaped.
- 06
OpenAI Codex Automations
agent-platform
Codex tasks now run on schedules and triggers — recurring report and summary jobs without manual prompts.
OpenAI shipped Automations alongside GPT-5.5: scheduled and triggered Codex tasks that run unattended and write back to repos, docs, or webhooks. Combined with the new plugins/skills surface in Codex, this is OpenAI's serious answer to Anthropic's Claude Code skills ecosystem. Useful for cron-style team workflows that don't need a human in the loop.
- 07
OpenRouter
model-router
Multi-provider model gateway — every major release this week landed on OpenRouter within hours.
OpenRouter remains the cleanest "one API key for everything" option for hobbyists and prototypes — DeepSeek V4, Qwen 3.6, Kimi K2.6, GPT-5.5, and Gemma 4 are all routable behind a single OpenAI-compatible endpoint. The leaderboard at openrouter.ai/rankings is the most direct "what are devs actually calling right now" signal in the ecosystem.
Resources· 3
- 01
An update on recent Claude Code quality reports
postmortem
Anthropic's engineering postmortem on the Claude Code quality dip — what broke, why, how it shipped.
Anthropic published a detailed engineering writeup on the Claude Code quality regressions users had been reporting for weeks. The root cause: a March 26 change meant to clear stale thinking from idle sessions ended up clearing it every turn for the rest of the session, silently degrading reasoning quality across long runs. The post is the most-read engineering writeup of the week and a useful reference for anyone running stateful LLM systems — particularly the discussion of why their existing eval suite missed it.
- 02
I am building a cloud
long-form
David Crawshaw on what's actually wrong with Cloud 1.0 — and why he's building something different at Tailscale.
Crawshaw (Tailscale cofounder) lays out what he sees as the fundamental ergonomic failures of the AWS/GCP/Azure model: defaults that punish small teams, IOPS budgets a tenth of a laptop's, and Kubernetes as "high-quality lipstick on a pig." Top of HN with 1,000+ points and an unusually high-quality comment thread. Worth reading even if you disagree with the conclusions — the framing of the problem is sharper than the average cloud-skeptic essay.
- 03
Sabotaging projects by overthinking, scope creep, and structural diffing
long-form
Kevin Lynagh on the failure modes of senior engineers — and why "better is good" beats "perfect later."
Lynagh names three reliable ways engineers sabotage their own projects: overthinking, scope creep, and obsessive structural diffing against an idealized version that doesn't exist yet. The piece is short, specific, and pulls from his consulting work. Comments on HN are unusually substantive — including a useful Obama quote ("better is good") that captures the post's thesis better than most of the writing about incremental delivery does.