CASE STUDY

HighLoad Infrastructure for Judo Battle Portal

Next.js SSR + Strapi at 15,000 concurrent connections: three-node architecture, Varnish cache, PM2 cluster, and autonomous CI/CD.

About the Project

Sports portal with a heavy frontend: Next.js SSR, 24 JS chunks, dynamic content, and Strapi CMS. The goal was to handle 15,000 concurrent connections during tournament peaks without user-facing degradation. We split the monolith into isolated nodes, built smart caching, and validated the KPI with a stress test.

From Monolith to Isolated Architecture

Before

Nginx, Frontend, Backend, and DB on one server. Next.js SSR handled every request — at 15,000 connections CPU hit 100%, Strapi ran single-threaded, admin sessions broke, SSH was open to brute force.

After

Three independent nodes on private network 172.16.0.0/28: Proxy (Nginx + Varnish, 12 GB cache), Frontend (PM2 cluster, 8 Next.js instances), Backend (4 Strapi instances). External world sees only port 443. Deploys, SSL, and log rotation run on autopilot.

Solution Architecture

HighLoad stress test results
Proxy Server

Nginx + Varnish with custom VCL. Static assets cached for 1 year, SEO pages for 10 minutes with language cookies. RSC, prefetch, and API bypass cache. Grace mode serves stale pages up to 1 hour when backend fails.


Nginx
Varnish
Custom VCL
Let's Encrypt
12 GB RAM cache
Frontend (Next.js)

PM2 Cluster Mode across all CPU cores (8 instances). Automated cron CI/CD: backup, git pull, build, safe restart via pm2safe. max_memory_restart: 1G, log rotation at 5 MB.


Next.js SSR
PM2 Cluster
Bash CI/CD
ACL pm2safe
Auto log rotation
Backend (Strapi)

Cluster of 4 instances on ports 1337–1340 (STRAPI_WEB_CONCURRENCY: 2). Admin panel pinned to 1337 for stable sessions, public API load-balanced. systemd resurrection on server reboot.


Strapi
PostgreSQL
PM2 × 4
Nginx LB
systemd

Key Solutions

  • Smart Varnish Caching

    24 JS chunks and static assets — immutable for 1 year. SEO content (news, athletes, clubs) — 10 minutes. Next.js dynamics and admin bypass cache to preserve interactivity and auth.

  • Autonomous CI/CD

    Script polls GitHub every 5 minutes. New commit → tar.gz backup → npm install → build → safe PM2 restart. Updates without user-facing downtime.

  • PM2 Clustering

    8 Next.js + 4 Strapi instances with auto-restart on memory leaks (1G / 2G). One node failure doesn't stop the service — others pick up traffic instantly.

  • Closed Perimeter

    SSH (port 22) closed by default on all servers. Access only via provider console or temporary scripts. Internal network isolated — only Proxy is public.

  • Automated SSL Renewal

    Certbot via systemd timer on Proxy (every Sunday). On Backend — bash script with proper Nginx stop in standalone mode. RandomizedDelaySec to avoid Let's Encrypt rate spikes.

  • Grace Mode & Self-Healing

    When backend fails or stalls, Varnish serves cached pages for another hour — users don't see 502/504. PM2 and systemd automatically restore processes.

  • Stress Test Results

    Apache Bench stress test: 15,000 concurrent connections, 100,000 requests on a heavy SSR site. Project KPI achieved with significant headroom.

    15,000+

    concurrent connections without failure

    100,000

    requests in a single run

    ≤ 25%

    CPU on Proxy server at peak

    1–3%

    CPU Frontend/Backend (cache working)

    ≤ 6 GB

    RAM per machine in steady state

    0

    502/504 for users (grace mode)


    Bottleneck: Limit was not software but physical hosting bandwidth (~1 Gbps). At full channel saturation TLS errors and timeouts began — CPU and RAM remained well within capacity.

    Want the Same Result for Your Business?

    Let's discuss your project and identify key growth points.
    Discuss My Project