CASE STUDY

HighLoad Infrastructure for Judo Battle Portal

Next.js SSR + Strapi at 15,000 concurrent connections: three-node architecture, Varnish cache, PM2 cluster, and autonomous CI/CD.

About the Project

Sports portal with a heavy frontend: Next.js SSR, 24 JS chunks, dynamic content, and Strapi CMS. The goal was to handle 15,000 concurrent connections during tournament peaks without user-facing degradation. We split the monolith into isolated nodes, built smart caching, and validated the KPI with a stress test.

From Monolith to Isolated Architecture

Before

Nginx, Frontend, Backend, and DB on one server. Next.js SSR handled every request — at 15,000 connections CPU hit 100%, Strapi ran single-threaded, admin sessions broke, SSH was open to brute force.

After

Three independent nodes on private network 172.16.0.0/28: Proxy (Nginx + Varnish, 12 GB cache), Frontend (PM2 cluster, 8 Next.js instances), Backend (4 Strapi instances). External world sees only port 443. Deploys, SSL, and log rotation run on autopilot.

Solution Architecture

Proxy Server

Nginx + Varnish with custom VCL. Static assets cached for 1 year, SEO pages for 10 minutes with language cookies. RSC, prefetch, and API bypass cache. Grace mode serves stale pages up to 1 hour when backend fails.

Nginx

Varnish

Custom VCL

Let's Encrypt

12 GB RAM cache

Frontend (Next.js)

PM2 Cluster Mode across all CPU cores (8 instances). Automated cron CI/CD: backup, git pull, build, safe restart via pm2safe. max_memory_restart: 1G, log rotation at 5 MB.

Next.js SSR

PM2 Cluster

Bash CI/CD

ACL pm2safe

Auto log rotation

Backend (Strapi)

Cluster of 4 instances on ports 1337–1340 (STRAPI_WEB_CONCURRENCY: 2). Admin panel pinned to 1337 for stable sessions, public API load-balanced. systemd resurrection on server reboot.

Strapi

PostgreSQL

PM2 × 4

Nginx LB

systemd

Key Solutions

Smart Varnish Caching

24 JS chunks and static assets — immutable for 1 year. SEO content (news, athletes, clubs) — 10 minutes. Next.js dynamics and admin bypass cache to preserve interactivity and auth.

Autonomous CI/CD

Script polls GitHub every 5 minutes. New commit → tar.gz backup → npm install → build → safe PM2 restart. Updates without user-facing downtime.

PM2 Clustering

8 Next.js + 4 Strapi instances with auto-restart on memory leaks (1G / 2G). One node failure doesn't stop the service — others pick up traffic instantly.

Closed Perimeter

SSH (port 22) closed by default on all servers. Access only via provider console or temporary scripts. Internal network isolated — only Proxy is public.

Automated SSL Renewal

Certbot via systemd timer on Proxy (every Sunday). On Backend — bash script with proper Nginx stop in standalone mode. RandomizedDelaySec to avoid Let's Encrypt rate spikes.

Grace Mode & Self-Healing

When backend fails or stalls, Varnish serves cached pages for another hour — users don't see 502/504. PM2 and systemd automatically restore processes.

Stress Test Results

Apache Bench stress test: 15,000 concurrent connections, 100,000 requests on a heavy SSR site. Project KPI achieved with significant headroom.

15,000+

concurrent connections without failure

100,000

requests in a single run

≤ 25%

CPU on Proxy server at peak

1–3%

CPU Frontend/Backend (cache working)

≤ 6 GB

RAM per machine in steady state

0

502/504 for users (grace mode)

Bottleneck: Limit was not software but physical hosting bandwidth (~1 Gbps). At full channel saturation TLS errors and timeouts began — CPU and RAM remained well within capacity.

[ Related Services ]

High-Load & Architecture

Load Testing

[ More Cases ]

IoT Monitoring

VPN Infrastructure

Want the Same Result for Your Business?

Let's discuss your project and identify key growth points.

Discuss My Project