Harvard's CS 75 Scalability Notes on Harvard's CS 75 lecture 9 on Scalability (Summer 2012).
Aug 17, 2025 View on GitHub
Some hosts may block IP addresses in certain regions or countries.
SFTP vs. FTP : SFTP encrypts all traffic (important for credentials). FTP sends usernames and passwords in the clear, which is a critical security risk.
“Unlimited storage” hosting plans usually oversell resources; you are sharing one machine with hundreds of other users.
VPS (Virtual Private Server) : Still shares hardware, but you get your own OS instance; only you and admins access your files.
For maximum privacy and control, you may need to run your own servers.
AWS EC2 : Example of an IaaS option offering virtualized servers.
Add more RAM, CPUs, and disks to a single server.
Ceiling limits:
Cost and hardware constraints.
State of the art in technology, since no machine has infinite resources.
Add multiple (often cheaper) servers.
Distributes load and avoids the ceiling of a single machine.
Common in modern system design.
Distributes inbound requests across multiple servers.
Client sees only the load balancer’s public IP.
Can be implemented by :
DNS Round Robin : simple but limited (cache issues, uneven loads).
Dynamic load awareness : routing based on server load.
Dedicated resources : for example, separate servers for static content (images, video).
Load Balancing Tech :
Software: ELB (AWS), HAProxy, LVS.
Hardware: Barracuda, Cisco, Citrix, F5 (expensive).
With multiple servers, sessions stored locally can be lost if requests hit a different server.
Solutions :
Centralized session storage (file server, database, or NFS).
Sticky sessions (session affinity).
Ensures requests from the same user hit the same server.
Approaches :
Cookies with server IDs : brittle, because if a server dies the cookie may keep sending the user to a dead server.
Load balancer-managed mapping : better, because the load balancer assigns a random ID and handles the logic.
RAID applies to disks within a single server , not replication across servers.
RAID improves redundancy and performance:
RAID 0 : Striping (fast, no redundancy).
RAID 1 : Mirroring (redundancy, doubles space needed).
RAID 5/6 : Parity-based redundancy.
RAID 10 : Striping plus mirroring.
Reduces downtime risk from disk failures.
Shared storage tech :
Fiber Channel, iSCSI for high-speed SAN.
NFS for shared filesystems.
Databases (MySQL) for session storage.
Primary handles writes, replicas keep synchronized copies.
Replicas improve read scalability.
Failover is possible but involves downtime.
Writes allowed on multiple primaries.
Provides higher availability and redundancy.
Complexity comes from conflict resolution.
Can use active-passive load balancers or active-active pairs.
Passive balancer can auto-promote itself if the active one fails.
Split data across servers based on rules (for example, A-M vs. N-Z users).
Enables horizontal database scaling.
Catch: cross-partition operations (for example, user pokes someone on a different server) become more complex.
Static HTML vs. dynamic DB-driven pages : Static is fast but harder to update.
MySQL Query Cache : Stores results of identical SQL queries, but must be invalidated when data changes.
Memcached : Distributed in-memory key-value store.
PHP Acceleration : Keeps compiled opcodes in memory, avoiding re-parsing scripts each request.
Reduces database load.
Speeds up repeated queries or frequently accessed content.
Multiple geographically distributed data centers mitigate outages and disasters.
DNS directs traffic to the nearest or healthiest center.
AWS Availability Zones : physically separate buildings or clusters within a region, isolated from each other’s failures.
Firewalls restrict unnecessary ports (least privilege principle).
HTTP and HTTPS for web traffic; database ports only for DB communication.
Helps contain breaches and reduce attack surface.
┌────────────┴────────────┐
┌───────────────┐ ┌───────────────┐
│ Web + DB │ ... │ Web + DB │
└───────────────┘ └───────────────┘
┌────────────┴───────────┐
┌─────────────┐ ┌─────────────┐
│ Web Servers │ ... │ Web Servers │
└─────┬───────┘ └───────┬─────┘
│ Shared Session + Cache │
└───────────┬────────────┘
┌───────────────┐ ┌───────────────┐
│ Replica DB │ ... │ Replica DB │
└───────────────┘ └───────────────┘
┌───────────────┐ ┌───────────────┐
│ Data Center 1 │ │ Data Center 2 │
└───────────────┘ └───────────────┘
└──────────DNS────────────┘
┌────────────┘ └─────────────┐
┌───────▼─────────┐ ┌─────────▼───────┐
│ Web Servers │ │ Web Servers │
│ (App / PHP/etc) │ ... │ (App / PHP/etc) │
└───────┬─────────┘ └─────────┬───────┘
│ (Shared Session / Cache) │
└──────────┴─────────────────┘
[ Firewalls between tiers ]
Thank you for reading ❤️ .
Last updated on
Aug 18, 2025