UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
17 May 2008, 00:09:19 EDT  
How YouTube scales MySQL for its large databases

How YouTube scales MySQL for its large databases

By Tim Finin on Friday, December 28th, 2007 at 10:04 am.

Like most research labs, we rely on MySQL whenever we need a database. And like most (I’m guessing, here), it’s common to overhear something like the following in our lab — “We really need to replace MySQL with Oracle or DB2 in X so it can handle the load.” But we never get around to it.

Maybe we don’t have to. Check out Scaling MySQL at YouTube, a keynote talk by YouTube DBA Paul Tuckfield at the 2007 MySQL Conference put online by Conversationnetwork.org.

“In mid 2006, YouTube served approximately 100 million videos in a single day. To maintain a website of that scale, one would imagine YouTube has hundreds of DBAs. But in fact, there are just three people that make it all work. Paul Tuckfield, the MySQL DBA at YouTube shares horror stories about scalability at YouTube and how he coped with them to keep the show going everyday, while learning important lessons along the way. … According to him, the three important reasons for YouTube’s scalability are Python, Memcache and MySQL replication, the last having the most impact. Most people think that the answer to scalability is in upgrading hardware and CPU power. Adding CPUs doesn’t work on its own; wisdom is in getting the maximum amount of RAM for the CPU and then fine tuning.” (src)

Related posts: • Large RDF triple stores;  • Search, Google, and Life according to Sergey Brin;  • SIGMOD Workshop on the Web and Databases;  

 

 

4 Responses to “How YouTube scales MySQL for its large databases”

  1. Bruce Says:

    Hi,

    have a look at http://highscalability.com/ for more examples of LAMP, JAVA, etc architectures and why you don’t need to go down the DB2, Oracle, etc route to get high load sites off the ground. If Flickr, etc, don’t use them, why should we?

  2. James Says:

    Because you know, the choice is always between the free, feature-poor, fast at ’select * from table’ dbms and the expensive, scalable, feature-rich dbms. There couldn’t possible be a free, feature-rich, scalable dbms out there that you could use. Of course not.

  3. Aaron Trevena Says:

    @James,

    It’s a case of picking the right tool for the job - I’ve worked on two high availability sites for different clients in different markets this year : Aviation Briefings for airlines, etc and Online Classifieds - one required Postgres, one required MySQL.

    Horses for courses - quite simply - for this kind of task mysql beats postgres in terms of ease of scaling, query caching, etc - when you’re dealing with very high traffic, then you’re better off breaking up and simplifying your schema, etc in order to get the most speed, than using the “more powerful” RDBMS.

  4. Floyd Price Says:

    Its good to see website with this much traffic sticking with MySQL :-)

Leave a Reply

Recent posts

  • The Psychology of Social Networking on KQED Forum show
  • Students: brand yourself with a blog
  • Social Data on the Web workshop at ISWC 2008
  • Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBC
  • Gossip-Based Outlier Detection for Mobile Ad Hoc Networks

  • Ebiquity community

  • Fieldmarking data blog
  • Geospatial Semantic Web
  • Harry Chen thinks aloud
  • Planet social media research
  • Social media research blog
  • TrackForward by Kolari
  • UMBC GAIM

  • UMBC