15,000 Concurrent Connections on Next.js SSR: Judo Battle Case Without 503
Tournament final. The stream is live, social feeds spread the link, thousands of fans open standings, athlete profiles, and news at once. The site runs Next.js SSR — heavy: 24 JS chunks, dynamic content, Strapi CMS. CPU hits 100%, queues grow, users see a white screen or 502.
That was the starting point for the Judo Battle portal. The project KPI was strict: 15,000 concurrent connections without degradation. Below — what we built, stress test numbers, and a checklist for your next traffic spike.

Fig. 1. At the connection KPI, the hosting bandwidth became the bottleneck — not application CPU
Why SSR kills the server at peak
For marketing, the site “just needs to open.” For infrastructure, SSR means HTML generation per request: Node renders the page, pulls Strapi data, ships bundles. At 15,000 concurrent connections, even 8 CPU cores fail if every request hits the app.
Typical mistake: everything on one machine — Nginx, Next.js, Strapi, PostgreSQL. One service failure takes down all. Plus open SSH and public IPs between nodes — extra attack surface.
| Peak symptom | Business impact |
|---|---|
| 100% frontend CPU | Every visit costs more — ads and organic traffic wasted |
| 502/504 with “healthy” servers | Users leave for competitors after 3–5 seconds |
| Editor sessions break | Content does not update during the event — reputational hit |
| No headroom before ad spike | Ad budget burns on errors |
Architecture: three nodes instead of a monolith
We split the stack into three independent servers on the provider private network (172.16.0.0/28). Only the Proxy on port 443 is public.
1. Proxy — Nginx + Varnish (12 GB cache)
Custom VCL: static assets and JS chunks — 1 year, immutable. SEO pages (news, athletes, clubs) — 10 minutes with language cookies. RSC, prefetch, API, admin — bypass cache to preserve interactivity.
Grace mode: when the backend stalls or fails, Varnish serves stale pages for another hour. Users do not see 502/504 — critical on final day.
2. Frontend — Next.js in PM2 cluster (8 instances)
Cluster across all CPU cores. max_memory_restart: 1G — memory leaks do not hang a node forever. Bash cron CI/CD every 5 minutes: tar.gz backup → git pull → build → restart via pm2safe user. Log rotation at 5 MB — disk never fills.
3. Backend — Strapi × 4 (ports 1337–1340)
Admin panel pinned to 1337 — moderator sessions do not break under load balancing. Public API round-robins all instances. max_memory_restart: 2G, systemd resurrection after reboot.
Stress test numbers: KPI met, software idle
Validation — Apache Bench: 15,000 concurrent connections, 100,000 requests on a heavy SSR site.
| Metric | Value |
|---|---|
| Proxy CPU (Nginx + Varnish) | ≤ 25% |
| Frontend / Backend CPU | 1–3% |
| Steady-state RAM | 2–6 GB per machine |
| Bottleneck | ~1 Gbps hosting link |
Takeaway: at target load the app barely worked — Varnish served everything. The limit was physics, not Node.js.
When edge Varnish, when not
| Scenario | Recommendation |
|---|---|
| Public SEO pages, catalogs, news | Edge cache (Varnish/CDN), TTL 5–15 min |
| Account, cart, checkout | No cache or cookie-segmented cache |
| Next.js RSC / prefetch / API routes | Explicit VCL bypass — or hydration breaks |
| CMS admin | Sticky session on one instance |
Checklist before a tournament or ad peak
- ☐ Align traffic forecast with engineering — concurrent connections, not just “we expect hype”
- ☐ Run a stress test at 2–3× forecast
- ☐ Verify public pages serve from cache (
X-Cache: HITheaders) - ☐ Confirm dynamics (RSC, API, admin) bypass cache per VCL rules
- ☐ Enable grace mode / stale-while-revalidate — users must not see 502 on brief backend blips
- ☐ Alert on CPU, RAM, and 5xx before social media complains
- ☐ Check link bandwidth — software may be ready while the pipe is saturated
Bottom line
HighLoad for heavy Next.js is not “buy a bigger box.” It is offload repeat traffic to edge cache, isolate nodes, cluster the app, and prove numbers with a stress test before 15,000 real users arrive. In Judo Battle, frontend and backend idled at peak — the proxy did the work.
NineLab designs and ships such stacks end to end: architecture, Varnish/Nginx, CI/CD, load runs. Full infrastructure breakdown — Judo Battle case study. Need an audit before peak — request load testing or talk to an engineer.
Related services
FAQ — HighLoad for Next.js and SSR
No. Concurrent connections are open TCP/TLS channels right now. RPS is HTTP requests per second. One connection can carry multiple requests (HTTP/2), so the metrics are not interchangeable.
Vertical scaling helps up to a point, but SSR burns CPU per request. Edge cache offloads 80–95% of traffic from the app — usually cheaper than linearly scaling frontend power for every tournament peak.
For a 1–3 engineer team on fixed servers, PM2 + Nginx + Varnish is simpler to operate and lower TCO. K8s pays off with frequent releases across many services and a mature platform role — not because it is trendy.
2–3 weeks before a marketing peak, ad campaign, or live final broadcast. A run at 2–3× forecast is cheaper than one hour of downtime at peak conversion.
Want to apply this in practice?
Tell us about your system — we’ll propose a work plan and the metrics worth fixing in an SLA/SLO.
Related articles
Industrial IoT: Why Your Sensor Pilot Never Reaches Production
Typical industrial IoT failures: from demo gadgets to 1000+ sensors. Downtime math, MQTT/edge/cloud architecture, and a checklist before you scale the pilot.
Read ArticleExcel Isn't Enough Anymore: 5 Signs Your Business Needs a Custom App
Clear signs your company has outgrown spreadsheets: accounting mistakes, chat-based approvals, lost requests, and no end-to-end visibility. Learn when it’s time to automate business processes and build an internal web app (portal, customer cabinet, ticketing workflow) that fits how your team actually works.
Read ArticleHow to DIY stress test your website and know when it will crash
Instructions on testing your site yourself: basic tools (k6, Apache Benchmark), common pitfalls, and a detailed breakdown of why online stores fall during ad campaigns.
Read ArticleSaaS Platform Development: Why Writing Code Is Only Half the Battle
The full cycle of SaaS product creation: from architecture design to server configuration for thousands of users. Why 90% of startups fail not because of code, but because of infrastructure.
Read Article