MashupScalabilty

From MashupCamp

Jump to: navigation, search

Contents

Fundamentals

1st Question - is it better to have 1 x $10,000 server versus 10 x $1,000 servers?

Of course it is 10 x $1,000 servers. So when you are designing a mashup, you should try to make it horizontal

  • Performance relates to how fast a page loads.
  • Scalability is the ability to keep that performance while increasing the number of users/page-loads.

Important things to consider include

  • load-balancing
  • Caching
  • Database Scalability

Caching

  • caching is king server-side
  • use memory rather than file cache - memcached is a good tool
  • you can use accelerators such as APC for PHP which is compiled bytecode, runs from RAM and is open source.
  • The number of hits and misses in cache is important to monitor

Apache

  • With just 4 machines or so, Apache mod_rewrite could be used to distribute load.
  • Using a static version of Apache rather than one that dynamically loads modules can make a big difference
  • It is also possible to use lightweight versions such as lighttpd
  • Or you could use mongrel for ruby on rails
  • There may even be a trend away from Apache due to it's scalability
  • Consolidate log files
  • Reduce state information by sending a user to just 1 server

Cookies

  • PHP sessions are cookie based which implies the same server but a central database.
  • Cookies are like a free cache
  • Yahoo! are maxing out the information that can be stored in cookies
  • Tomcat and Cisco load balancer write the server in use into the cookie

MySQL

  • MySQL was not built to scale. On the small to medium scale, replication is an option but at some point, the replication traffic overtakes the actual data-query traffic.
  • Other options include using a multi-master or sharding.
  • LiveJournal solves some issues by pointing to the specific database cluster used
  • Prepared queries are no very very scalable as there is no advantage gained from the query cache
  • 700 simple queries per second isn't much of a problem. However complex queries are a lot more difficult

Misc

  • Twitter is running on something like 130 instances of mongrel over 10 machines
  • If using encryption/compression hardware, it is often best to do so in the load balancer. This makes debugging easier
Personal tools