Beatport

As an application developer at the world's largest online retailer of electronic dance music, I worked closely with a large team of highly talented engineers. With millions of monthly uniques and petabytes of storage, Beatport's online music store is a large scale operation with plenty of real world challenges.

Beatport DJs

In September 2011, I joined the newly formed DJs team in Beatport's new satellite office in San Francisco. Our 5 man team came up to speed quickly coordinating remotely with the 25 person engineering department in Denver, CO. Our purpose was to expand the company's image from it's basic music sales roots to become an engaging destination for DJs and their fans.

  • Basic Functionality

    DJs can create a profile with a vanity URL to showcase their work.

  • Charts

    Prior to DJs, all charts on Beatport were created by hand by the content operations team. We built tools to allow DJs to build their own charts, allowing DJs to interact with Beatport in a way previously reserved for a select group of featured artists. In just a few months, the number of charts in the system doubled its 5 year running total.

  • Videos

    A video module was created to allow DJs to post youtube videos to their profile making use of data from the YouTube API.

  • Soundcloud

    A similar module was created allowing DJs to link their DJ profile with their soundcloud account, automatically displaying their newest tracks on Soundcloud on their DJ Profile using Soundcloud's API.

  • Events

    An events system was created to allow DJs to publish information about their upcoming events.

Beatport Mixes

In April 2012 I relocated to Beatport HQ in Denver to join the new 3 member Mixes team. In a two week sprint, we worked nights and weekends in close coordination with the Data Services and Infrastructure teams to create a fully functioning alpha version of the site in unprecedented time. The world's first legitimate marketplace for DJ mixtapes.

  • Basic Functionality

    DJs can create a Mix, upload a 500mb mp3 from their browser, associate beatport tracks with start times, reorder tracks, associate the mix with an event, preview the reencoded mix in their browser moments after upload and publish the mix for sale on Beatport.

  • Top 100

    In close coordination with Data Services, a system was created for collecting and publishing historically accessible top 100 charts for all mixes. Nightly data warehouse ETL scripts process order information in the DW and publish results to a separate historical top 100 schema accessible from the web tier.

Beatport Music Store

As a member of the engineering department, I stepped above and beyond my duties spearheading several efforts to improve the performance and efficiency of all Beatport properties.

  • Statsd

    In coordination with the infrastructure department, I helped implement a system for feeding an array of application performance metrics to statsd from the web tier. Below are a few examples of the type of information made visible by the system.

    Api_Count         - Average number of internal API calls required to fullfill a request
    Api_Time          - Average time spent waiting for internal API responses
    Peak_Memory_Usage - Average peak memory usage
    Query_Count       - Average number of SQL queries required to fullfill a request
    Query_Time        - Average time spent waiting for SQL query results
    Response_Size     - Average HTTP response body size
    Solr_Count        - Average number of solr queries required to fullfill a request
    Solr_Time         - Average time spent waiting for solr query response
    View_Render_Time  - Average time spent rendering the view
    Zend_Request      - Average response time
          
  • Timers and Projects

    10 timers were created; each timer is split by project

  • Endpoints

    Each timer for each project is split by endpoint {{controller}}-{{action}}

  • HTTP Caching

    Here we can see a drastic reduction in request volume for key high traffic endpoints as a result of some aggressive HTTP caching

  • Caching Twig

    Here we clearly see a drastic reduction in application response time after properly configuring the Twig view cache

  • Session DB Perf.

    This graph helped alert us to a growing performance problem with the session database. Switching from MyISAM to InnoDB fixed table locks on a high WPS table.

  • Defect Detection

    A defect introduced on Aug 14th is clearly visible in the response time graph. Since it does not directly effect availability, it is fixed with the next release.

  • View Render Time

    By removing incidental doctrine queries from the view layer, the view rendering time is drastically reduced

  • Memory Usage

    By removing unnecessary heavy components such as Doctrine, the memory footprint of many internal APIs is improved.

  • API Evolution

    This graph illustrates the point at which we switched many of our internal apps from version 2 to version 3 of our most highly trafficked API.

  • High Level Overview

    A high level view shows a decrease in response time and variability across the board over a 6 month period

  • DB Query Volume

    A department initiative to cache internal API responses and reduce the query volume from web applications brings a huge reduction in QPS to the db tier.

  • More HTTP Caching

    More agressive HTTP Caching for logged out users reduces request volume to the accounts project by 40%.

  • Statsd for Retrospectives

    Below is a graph from a bumpy release and below that are notes presented in the release retro to highlight what went wrong and when for use in discussing future release downtime mitigation.

  • 14:30 - release begins

  • 14:42 - bad DB query in catalog API causes mysql connection saturation, site becomes unresponsive

  • 14:55 - DJs is put into maintenance mode to eliminate calls to bad DB query

  • 14:56 - catalog stats come online for the first time

  • 14:58 - double page load effects begin

  • 15:20 - static version hash updated (no change)

  • 15:22 - static assets are properly redeployed

  • 16:00 - prod patch related to identity API is deployed

  • 16:30 - catalog deployed, varnish cleared, double page load effects eliminated

  • 16:45 - DJs taken out of maintenance mode, all green

  • Application Performance Dashboard

    Statsd is a wonderful tool, but it can sometimes be difficult to see trends across large numbers of metrics without a lot of digging around. To solve this problem, I reverse engineered Graphite's image rendering API to create a dashboard giving a higher level comparitive overview of system performance.

  • Stage vs Prod

    Here we have stage on the left and prod on the right. Metrics such as req/s second are less useful here but API and SQL count can be useful across environments.

  • Prod vs Prod

    Comparing different time frames from the same environment is much more useful; for instance before/after a major release

  • Quick Access

    All data points are clickable, bringing up the proper graph

  • End Result

    The end result of these incremental performance enhancements is clearly visible in the web tier CPU usage. By bucking the trend and reducing CPU load on the web tier by 50%, we were able to devise a new split-pool deployment strategy which greatly reduced production down time and the need for additional application servers.

Technology Department (38)
  • 3 product experts
  • 6 application developers (me)
  • 4 ui developers
  • 3 database developers
  • 4 data services engineers
  • 4 qa engineers
  • 6 infrastructure engineers
  • 3 devops
  • 4 directors
  • 1 cto
Techologies
  • JS (jQuery)
  • CSS
  • HTML
  • PHP
  • Zend
  • MySQL
  • Varnish
  • Puppet
  • Statsd
Duration
Sept 2011 - Apr 2013 (18 months)