MashupScalabilty
From MashupCamp
Contents |
[edit]
Fundamentals
1st Question - is it better to have 1 x $10,000 server versus 10 x $1,000 servers?
Of course it is 10 x $1,000 servers. So when you are designing a mashup, you should try to make it horizontal
- Performance relates to how fast a page loads.
- Scalability is the ability to keep that performance while increasing the number of users/page-loads.
Important things to consider include
- load-balancing
- Caching
- Database Scalability
[edit]
Caching
- caching is king server-side
- use memory rather than file cache - memcached is a good tool
- you can use accelerators such as APC for PHP which is compiled bytecode, runs from RAM and is open source.
- The number of hits and misses in cache is important to monitor
[edit]
Apache
- With just 4 machines or so, Apache mod_rewrite could be used to distribute load.
- Using a static version of Apache rather than one that dynamically loads modules can make a big difference
- It is also possible to use lightweight versions such as lighttpd
- Or you could use mongrel for ruby on rails
- There may even be a trend away from Apache due to it's scalability
- Consolidate log files
- Reduce state information by sending a user to just 1 server
[edit]
Cookies
- PHP sessions are cookie based which implies the same server but a central database.
- Cookies are like a free cache
- Yahoo! are maxing out the information that can be stored in cookies
- Tomcat and Cisco load balancer write the server in use into the cookie
[edit]
MySQL
- MySQL was not built to scale. On the small to medium scale, replication is an option but at some point, the replication traffic overtakes the actual data-query traffic.
- Other options include using a multi-master or sharding.
- LiveJournal solves some issues by pointing to the specific database cluster used
- Prepared queries are no very very scalable as there is no advantage gained from the query cache
- 700 simple queries per second isn't much of a problem. However complex queries are a lot more difficult
[edit]
Misc
- Twitter is running on something like 130 instances of mongrel over 10 machines
- If using encryption/compression hardware, it is often best to do so in the load balancer. This makes debugging easier
