Posts Tagged ‘DIWD’

Do It With Drupal: Drupal Under Pressure: Performance and Scalability

Friday, December 11th, 2009
  • Browser | Apache | PHP | -SQL Queries | MySQL
  • Common pattern for optimization: inspect each layer, add little buckets of caches everywhere
  • “Fast track” through the different layers to get out requests more efficiently
  • On browser side: Mod Expires, sends a message to the browser and says “I’ve got this info, you’ve already looked at it, we’re good”
  • Firebug will show you all the individual requests- says how many kb it takes to download (if you only have to download a little bit when you refresh, that’s good)
  • CDN – Content Delivery Networks and reverse proxy caches: any stuff that hasn’t changed, you don’t have to ask your internal infrastructure to handle that (hand it off to geolocated servers optimized to quickly serve out that info)
  • Proxy cache can be in front of your infrastructure (offload things Drupal would keep doing over and over)
  • PHP level: OpCode cache
  • MySQL level: query cache – takes all the read queries (most of the select statements) and stores the results in memory
  • Query cache, OpCode cache: half hour or less, significant improvements
  • Proxy caches and CDNs are a bit larger of a task
  • Component between database and PHP: MemCache – clone of some of Drupal’s tables
  • MemCache: take all the cached tables, hold it in memory
  • MemCache also used for sessions – if your sessions table is locking up, your site is about to implode
  • MemCache also used to speed up path aliasing stuff

Apache Requirements

  • Apache 1.3.x or 2.x, ability to read .htaccess fiels, AllowOverrideALL
  • If we take information in .htaccess and put it in main Apache config file – it’s faster, it might not be a huge bump in performance, turn off dynamic configuration of Apache
  • mod_rewrite (clean URLs), mod_php (Apache integration), mod_expires
  • MaxClients- number of connections you can have to Apache at once; if you set it too high for your server, you’ll run out of memory
  • RAM / AvgApache mem size = # max clients

mod_expires

  • ExpiresDefault A1209600 (AKA “two weeks”)
  • ExpiresByType text/html A1 (all images, CSS, javascript: they get cached for two weeks, except the text/html)
  • We can’t cache html in Drupal because that’s dynamic
  • This is telling Apache to send the headers to the browser that tell the browser it’s ok to cache it

KeepAlive

  • There’s overhead to opening TCP/IP connections
  • “We can have a conversation this long” – Apache and browser can keep a conversation going long enough to download an entire page
  • KeepAliveTimeout 2 (but you can monitor Apache threads to determine when a process turns into a wait process, refine it)
  • Resources: linuxgazette.net/123/vishnu.html

PHP requirements

  • 5.2.x, XMl extension, GD image library, Curl support, register_globals:off, safe_mode:off
  • PHP Opcode Cache: removes “compile to operation codes” steps – go right from parse PHP to execute
  • APC: http://pecl.php.net/package/APC
  • php.ini: max_execution_time = 60, memory_limit = 96M
  • If you’re uploading big things, you might need more; if you’re doing image handling/image manipulating (image cache to dynamically create image derivatives) may need to increase memory
  • Opcode cache is going to increase size of each Apache process? Or maybe not? (Debate ensues)
  • In any case, check and see if Apache is holding onto more memory
  • Use PHP best practice (don’t count things over and over – store that count and then move on)

True or False?

  • The more modules you enable, the slower your site becomes (TRUE!)
    • Sometimes you may not need a module for that – 5 lines of code and it’s done (don’t need a birthday module with candles, etc if you just need the number)
    • “Do I really need to enable this module?”
  • When my site is getting hammered, I should increase the MaxClients option to handle more traffic (FALSE!)
    • You’ll run out of memory, start swapping, and die
  • echo() is faster than print() (WHO CARES?)
    • This is taking things a little too far

Database server

  • MySQL 5.0.x or 5.1.33 or higher (there’s some problems before 5.1.33 with CCK)
  • MyISAM by default
  • In Drupal 7, there are changes – MyISAM locks the entire table from writing when one thing is getting written somewhere; the access column, user table, session table is getting written to on every page request – this can cause problems
  • Drupal 7 uses InnoDB – row-level locking, transactions, foreign key support, more robustness (less likely to get corrupted tables)
  • If you have a table that’s primarily read, MyISAM is a little faster
  • Query caching – specify query_cache_size (64M?), max_allowed_packet (16M?)
  • Is query cache size relative to table size? – yes, basically a bucket for read queries; how many result sets do you want to store in query cache

Query optimization

  • Find a slow query (can look at slow query log in MySQL), debug the query using EXPLAIN, it shows what’s getting joined together and all sorts of other details; save the query, save the world
  • log-slow-queries = /var/log/slow_query.log
  • log_query_time = 5 (5 milliseconds)
  • #log-queries-not-using-indexes: little ones that get run a ton, if you tweak that, you’ll optimize the site (voting API, casting a vote)
  • Add an index to reduce the number of rows it has to look through (tradeoff: it adds a little bit of time before a write can happen)

Drupal

  • Use Pressflow: same APIs as Drupal core but supports MySQL replication, reverse proxy caching, PHP 5 optimizatinos
  • pressflow.org
  • Almost all Pressflow changes make it back to core Drupal for the next release
  • Cron is serious business – run it
  • Drupal performance screen (/admin/settings/performance)
  • We can’t cache HTML like we can cache other things… but there’s a way to do it
  • It’s disabled by default; the normal version takes requests (stores anonymous-user-viewing-a-page and stores it in the database)
  • Aggressive cache bypasses some of the normal startup-y kind of things
  • Aggressive cache lets you know if there’s any modules that might be affected by enabling aggressive caching (such as Devel module)
  • MTV runs on 4 web servers and a database server – and has TON of caching/CDN
  • CDN is great for a huge spike in traffic
  • If you don’t have $$ for a CDN, use a reverse proxy like Varnish: don’t ask Drupal to keep generating stuff for anonymous traffic
  • Block caching is good
  • Optimize CSS = aggregate and merge (20 requests for CSS files can go to 2)
  • JSAggregator does compression for javascript (but be sure that you’ve got all the right semicolons)

Tools of the trade

  • Reverse proxy caches: like your own mini mini CDN; Varnish (varnish-cache.com)
  • Set time to live for your content – this leads to regulated traffic off the originating server
  • whitehouse.gov is being served all through Akamai; when you do a search, or post something you start to hit the original Drupal
  • Apache Benchmark – impact of your code on your site
  • It’s built-in with Apache (ab from command line)
  • ab -n 10 -c 10 http://www.example.com/ (10 requests, 10 at a time)
  • You get back a number (requests per second your site can handle)
  • More complicated for authenticated users; first, turn off all caching (for worst case scenario), look at the cookie and get the session ID, and do: ab -n 10 -c -C PHPSESSID=[whatever it is] http://www.example.com

devel module

  • Not suggested for a production site; Masquerade module is for switching users on a live site
  • Print out database queries for each page
  • Switch users
  • View session information
  • dsm()
  • db_queryd()
  • timer_start(), timer_stop()

MySQL Tuning Scripts

  • blog.mysqltuner.com
  • www.maatkit.org – makes human-friendly reports from slow query report

Kinds of scalability

  • Scalability – how long can you survive the load
  • Scaling: viral widgets, there, the mantra isn’t “protect the database”, it’s “protect the web servers” – get more web servers
  • Spike in anonymous user traffic (getting Slashdotted): site is a place for authenticated users, offload anonymous user traffic
  • Tons of authenticated users: 100k employees logging into an infrastructure from 9 to 5 – big, beefy servers in a hosting location

Where do you start?

  • Do the quick wins first
  • Save time for load testing
  • RAM is cheap, MemCache is a nice solution
  • If you get a warning about upcoming spikes in traffic, that triggers reverse proxy cache, CDN
  • Work with hosting companies that know their infrastructure; build a relationship with them early on to have these kinds of conversations
  • Some crashes are just a misunderstanding about what Drupal needs (going from a static site to Drupal without making changes)

When your server’s on fire

  • Always have breathing room if you can
  • If you’ve done MemCache, query caching, gone through all of that… add another box
  • Add another virtual server
  • Scalability = redundancy; back yourself up
  • If the site goes down, will you lose money? If yes, invest in infrastructure

Do It With Drupal: Drupal Under Fire: Website Security

Friday, December 11th, 2009
  • Your site is vulnerable (really, it is)
  • GVS offers security review service for Drupal
  • Bad things: abusing resources, stealing data, altering data
    • Abusing resources: DDOS (extorting money from site owner), using open relay in a mail sending module for spam
    • Stealing data: from users (their passwords, e-mail address)
    • Altering data: defacement
  • You don’t hear about security vulnerabilities much; Drupal core mentions vulnerabilities (and updates) but not so much for modules
  • Worry in a prioritized way
  • Choose your strategy: stay ahead of the pack, or protect valuable assets?
  • Attacks focus on sites that are out of date
  • Know about releases, have a method to update your site, do it
  • Look into Aegir if you’re running multiple sites

Configuration

  • Available updates- settings, e-mail notifications when modules you use are updated
  • Security review module: drupal.org/project/security_review
  • Will show if your site is under attack with a SQL injection
  • Part of security review: check off which roles are considered “trusted” – trust-checks and points out which permissions are bad to give to untrusted users
  • Can skip some of the checks so they don’t nag you (if this is on a dev server, and it’s not relevant)
  • There is a hook to be able to run additional checks, but not sure whether modules should be able to declare their things (do we trust module developers to come up with the right set of rules?)
  • If there’s something that can take an action on your site, accessible via a link – that could be a vulnerability (i.e. the “turn off this check” feature of the security module)
  • Run it before you launch, after you make big config changes; could do it as a periodic check and e-mail the report. Is ok to have always-on for a live site though

Vulnerabilities by type

  • Announcements from drupal.org; most sites have custom modules, almost always have custom themes
  • Analysis of one site: 3-4 vulnerabilities in Drupal core, 20 in contrib modules, 100 vulnerabilities in custom theme/modules (no one else is reviewing that stuff except for you)
  • XSS (cross-site scripting) – one of the hardest to fix
  • Access bypass – good ways to fix this
  • Cross site referral forgeries
  • SQL injection – easy to protect against, only getting easier

XSS

  • Anything you can do XSS can do (better)
  • XSS can change password for user 1
  • Most people don’t know they’ve been a victim of XSS; it’s in your browser, browser just executes javascript, don’t know until you try to log back in
  • XSS tools exist to probe your network – even if your Drupal is on the intranet
  • Automated tools are a great way to get started, but not all that valuable in actually identifying things (false positives, false negatives)

Insecure configuration of input formats

  • Input formats and filters are confusing – people do what they need, and forget about it, and open themselves up to XSS
  • Anonymous users: shouldn’t be allowed to have more than one input format
  • Filtered HTML is the right thing for untrusted roles
  • To this day, WYSIWYG modules say “give everyone access to full HTML and tinymce will just work” – NO! DON’T DO THAT!
  • Defaults are good: filtered HTML is a good thing
  • It’s tempting to add images, spans, divs, etc – but different browsers have different vulnerabilities that way
  • There’s a page on drupal.org that talks about what’s safe to put in (there’s some gray area – depends on your users and their browsers)
  • Weights: HTML corrector needs to go last

XSS for Themers/Coders (and reviewers)

  • Browscap module: analyzes user agents for people who go to the site
  • Firefox extension (default user agent) – used to be for Firefox to pretend to be IE, but now people use it for other things
  • Hackers can take normal user agent and replace it with jQuery that will be sent as the user agent – PWNED
  • Is there a module that will strip in javascript from the input box? – Filtered HTML does that
  • You can strip the script, or you can escape it (so it shows up as harmless text)
  • Filtered HTML also gets rid of attributes
  • There’s a module that says which attributes can come through on which tags – well, the admin screen for it is huge, the input format area is a problem because it’s confusing, so do you want to add an even more confusing module?
  • Themers: Read tpl.php and default implementations; rely on your module developer for variables that are ready to be printed (hook pre-process)
  • Developers: where does the text come from, is there a way a user can change it, in what context is it being used?
  • More is from the user than you think (user agents are from the user)
  • Filtered HTML makes things safe for the browser context
  • When data leaves Drupal and goes into MySQL – need to escape the data to make it safe for putting into the database
  • Contexts: mail (some clients sorta support javascript, need to specify plaintext), database, web, server
  • Take an hour: http://acko.net/blog/safe-string-theory-for-the-web
  • Drupal philosophy: make things secure by default
  • Escape variables using the checkplain function
  • If your site is translatable, it’s also probably secure
  • If you’re using the API properly, you probably don’t need to worry about security (but it takes a while to learn how to use the API property)

Cross Site Request Forgeries (CSRF)

  • Taking an action without confirming the intent of that action
  • User Protect module – makes it harder to delete user 1; protects anonymous user, user 1, can add other users
  • Drupal’s form API has protection from this – using links doesn’t
  • An anonymous user can insert an “image” (the browser goes to look for it, and if that “image” is the link for User Protect that deletes the protection for user 1, that’s bad)
  • In the case of User Protect, there’s now a confirmation form – browser would just fetch confirmation form and throw that away– requirement that you have to click on “submit” button would save you from anything bad happening
  • If you really want to use links like User Protect does, create a token based on something unique to the site, the user, and the action (and validate the token when the action is requested)
  • User session ID (unique key private to site, generated randomly at login) + form ID
  • When the action is submitted, Drupal will validate that it’s still there
  • Is it possible to give permissions to manage everything EXCEPT user 1? – that’s what User Protect does
  • Or, just use the form API – it includes this protection by default

Security and usability

  • Confirmation forms suck
  • BUT, truly destructive actions should be hard to do
  • Don’t delete, archive and provide undo
  • Choose links or forms for usability, not security

Resources

  • drupal.org/security-team
  • drupal.org/security
  • drupal.org/writing-secure-code
  • drupal.org/security/secure-configuration
  • heine.familiedeelstra.com
  • crackingdrupal.com
  • crackingdrupal.com/node/34 – XSS Cheat Sheet
  • crackingdrupal.com/node/48 – CSRF

Questions

  • Rainbow tables – MD5 values for every possible password up to 6 characters
  • crackingdrupal.com – has resources including list of security modules (Salt module has salting of passwords)
  • Any way to hide you’re running Drupal? – data in the CSS files, standard Drupal jQuery, a few files in the root directory, expiration date for anonymous is Dries’s birthday; there’s all sorts of things that fingerprint a Drupal site, trying to hide you’re running Drupal takes more time than it’s worth if you just keep up with updates

Do It With Drupal: Drupal In The Cloud

Friday, December 11th, 2009

Josh Koenig
drupal.org/user/3313
josh – at – chapterthree.com
getpantheon.com
About the cloud

  • “Cloud” as new model for hosting
  • Traditional hosting = real estate (rack space)
  • Most real estate customers are renters, few love their landlord – landlords sometimes cut corners and do the bare minimum to keep you happy… but you need this
  • Owning comes with lots of responsibilities and hidden costs
  • Large scale projects are expensive, slow, and prone to setbacks
  • “The Cloud” = hosting as an API: on-demand availability
  • Hourly pricing
  • Reliable, reusable start-states: people make mistakes vs. programs that do things and you know exactly what they’re going to give you
  • You can say: I want a new server, here’s the distro, here’s the information, here’s the configuration – and I want five of them
  • The cloud = less waste, more freedom, flexibility… but not a silver bullet
  • Performance can vary (don’t use it for scientifically accurate benchmarks)
  • Abstractions aren’t the same as the real thing (not the same as physical servers – but for what it’s worth this hasn’t been a problem for Drupal)
  • New tricks to learn – power of API
  • The Cloud is Drupal’s destiny – increasing Drupal’s reach; you can start with pennies, scale to millions
  • Create products cheaply
  • Grow organically, but still grow fast

Launch a server in the cloud

  • ElasticFox – Amazon control panel for Firefox
  • Amazon just added locations for US west coast
  • Pantheon project: create images for cloud services that are targeted towards Drupal
  • Three images: high performance production hosting image (all the tricks already done), another for an Aegir, another for a continuous integration environment for Drupal
  • Grand vision for world-class Drupal infrastructure for pennies an hour
  • High performance production has the most work since people have been the most interested
  • Ubuntu 9.04 base config, whole LAMP stack, Pressflow pre-installed, memcached, APC, all of it is already there
  • Can monitor processes, do everything you like to do as root
  • v0.8.1 beta – but people are using it in production (in spite of disclaimer)

Who are the cloud providers

  • AWS: most mature, a lot of features, still moving quickly, added a load balancer earlier in the year; they’re a utility, not interested in your particular use case; they don’t tell people what they’re working on or how it works
  • AWS has infrastructure for giving away free images – most don’t
  • Rackspace – has Rackspace Cloud Sites (you don’t get root, you put your Drupal in there, they scale it for you with mixed results); scaling any particular site requires deep knowledge of it; Rackspace Cloud Servers is better (Slicehost is built on top of Rackspace Cloud Servers)
  • Rackspace is looking to break into the space; willing to do deals, talk to you, etc
  • Voxel: smaller/smarter, also in Asia; cloud product just emerging from beta, but it’s good – also lets you intermingle cloud and physical infrastructure
  • And more every day!
  • VPS is becoming quite cloudy (linode.com, slicehost, vps.net)
  • Custom/managed cloud services (security, regulatory compliance issues – people will build a cloud for you: Eucalyptus, Neospire, others)
  • Cloud value-adders: Rightscale, Scalr – cloud/cluster management services
  • Cloudkick – cross-cloud services, managing different cloud providers (want to be able to move servers from one service to another); it’s free; open-source LibCloud project to prevent people from getting locked into one provider
  • Cloud tools for Drupal – getpantheon.com

Questions

  • How do you do a cost-analysis? You probably won’t see the financial benefits right away, if you’re going to leave it on all the time. But scaling with changing use patterns, adding/removing new instances.
  • Cost/benefit comes in disk speed performance – most cloud providers have poorer I/O performance than a physical server
  • How do you solve that problem for Drupal? – All performance/scalability work is about making Drupal do less work
  • Oriented around Drupal doing only what it needs to, and not bogging it down with things like showing the user the same page he saw a minute ago
  • Database replication for read-only queries
  • Use other tools that are better at repeated-action type jobs for those things

What is it good for

  • Testing/continuous integration
  • testing.drupal.org (Drupal testing Drupal) – not in the cloud, but will soon release cloud image of it
  • People can spin these up if Drupal finds itself in a testing bottleneck, just for the day
  • Development infrastructure: new server for each site
  • Putting things like version control (unfuddle, beanstalk)
  • Products and services: Lefora (forums), crowdfactory, olark (start with pennies, scale to millions)
  • Database layer for Drupal can be a choke point – you can duplicate it
  • High availability production hosting: Acquia is on EC2
  • Most cloud infrastructure isn’t cheap at this level (running many servers, keeping them always on-line), if you’re really big you’ll find yourself at the top end going to traditional managed hosting because there’s some levels of performance that are capped by the virtualization layer
  • Control costs for traffic patterns – geographically centralized audience for most people
  • Turning things on and off to deal with daily peaks – two more servers only on during the day
  • Instances fail, though not much more often than real servers (and remember that instances exist on real servers that do break)
  • Performance can be impacted by other local activity
  • Virtual disks tend to have relatively poor I/O performance
  • Accept the inevitability of failure, embrace the paradigm of “rapid recovery”, develop architecture with modular, replaceable parts (images for each server), minimize disk/CPU utilization for menial tasks
  • “RAM is cheap” – the more you can push to things that read/write out of memory, the better

Production hosting in the cloud

  • Monitor your load – you have to look more carefully than just hits
  • Spin up more instances (scale horizontally) as you need more power
    • How does this work?
    • Could be manual process (“we need a server, let’s do it”) – does need some manual intervention somewhere, though in theory you could script it
    • Amazon offers an auto-scaling feature (when we need more, add servers, up to X number of features – Amazon AutoScale)
    • AutoScale is simple (doesn’t cost anything, too)
    • How does this work? How do the pieces work together?
    • You need to have an image with all the pieces needed at the system level; you should use version control and have a boot script as part of the image (when the image start, the script checks out the current code base from the database and all the necessary connections), then AutoScale makes the pieces aware of what’s out there
    • You can also do load balancing more manually
    • Role of sysadmin is changing – new set of things where now you don’t have to worry about hard drives, but scaling up/down, saving money
    • When you’re doing horizontal scaling, you trigger your image to be built, it checks out the code; Amazon also offers virtual drive service (if you’re working with an application with a lot of data in file system) – can connect that data quickly
    • Bake in as much as you can to the image, then have automatic processes that fire that get the latest information, check it into infrastructure, start distributing load there
  • Add layers (scale vertically) when bottlenecks emerge
  • Create images for each layer in your infrastructure
  • Use best practices to keep things speedy

About best practices

  • Front-side caching: use Pressflow with Varnish and/or NgineX (Drupal 7 will support some of this natively)
  • Drupal is slow: complex, wonderful, brainy tool – if you’re looking at the same thing over and over again, go get a tool that does only that, and quickly
  • Use APC and/or Memcached to minimize queries and the database to eliminate costly unserialize() calls
  • Drupal’s native caches are good, but it does it in the database (this isn’t the highest performance option, serializing/unserializing big arrays/objects)
  • Architect for vertical scaling by utilizing all service layers, even if it’s one box
  • This is what “Mercury” is about
  • CREAM: Cache rules everything around me

Mercury

  • Freely available on Amazon, as VMWare image, in as many ways as we can
  • Also on-demand as a service
  • “Drupal hosting, 200 times faster”
  • Standardized high-performance stack: single server image with everything you want for cluster infrastructure
  • Features: Varnish, HTTP/PHP, APC Cache, Apache Solr, MySQL
  • Make Drupal run fast, hold up under large traffic spikes
  • From one box to cluster
  • If you’re running all four layers and are still falling down, or you’re doing something horribly write (Twitter) or horribly wrong (all code embedded in php content nodes)

Questions

  • Mercury: going to implement configuration management system (BCFG2, probably)
  • Mercury/Pantheon – not Amazon-centric, can roll the stack out anywhere (physical hardware, whatever)
  • You’d probably make your own variant image, and sync as necessary using the configuration management system
  • If you haven’t customized things heavily, you can take the latest version of Mercury, re-apply changes, and you’re done (if you don’t want to use the config management)
  • You can keep old images around for pennies a month

Do It With Drupal: The Power of Features

Thursday, December 10th, 2009

See also Features on drupal.org

  • Jeff Miccolis & Eric Gundersen – Development Seed, building a lot of products (things like Open Atrium)
  • Drupal is very configurable – but that’s also a weakness: no distinction between what’s configuration (views settings) and what’s content
  • Workflow problem: when you build a site, you build in a dev environment, but client/boss wants to see what it looks like before it goes live
    • So, you stage it somewhere, then move it over
    • Development: where the action happens (possibly your laptop)
    • Staging: where it’s reviewed (much closer to where it’s going to live)
    • Production: where it’s live. (developing on the live site is always a bad idea)
  • Three people working on a project that needs to go live
    • Musician, developer, themer
    • Round 1 goes great – everyone works together and the site goes live
    • Round two is a PITA: new views build on dev, rebuild on staging, rebuild on dev, rebuild on staging, over and over, rebuild on prod
    • Extensive note taking, prone to human error, loads of repeated tasks
  • The solution? Make a distinction between config and content – views and settings are heavily and clearly distinguished from the actual content – then write this configuration to code and get it out of the database
  • You can do version control with your config – this lets you track changes
  • Node types, CCK fields, menu, blocks, views – these are config
  • You can say “these components taken together define a feature” – something the site does
  • “Features” module – Feature = Drupal parts that do something specific (Views, ImageCache presets, content types, fields, etc.)
  • Features = Drupal module that allows for the capture of configuration into code
  • (Sorry about the name; the Feature module makes Feature modules which have things)
  • Feature modules have Core exportables: content types, permissions, input filters, menu items
  • Contrib support: contexts, views, ImageCache, Ctools (panels, feeds, etc.)
  • Features is a system to capture the various components that describe how your site behaves
  • Features should be used throughout the development process – you can take a live site and capture existing features, but it requires you to change your thinking about how users interact with the site
  • Concepting what’s part of which feature, what’s shared, etc. gives you stronger features

Making Features

  • Create a Feature: you can add components, cycle through various elements, clickthe ones you want in your module
  • Features come as a nice tarball – turn it on in your website, you get all the stuff that comes with it
  • But then people start changing the view – you can see the status in the Features module (has it been changed?)
  • If something has been changed, it’ll show you what
  • “Recreate” button will give you another tarball, with the current state of things

Create, Update, Revert

  • Drush commands – features, features export, features update, features revert
  • Views changes are made only once, each change has a commit log, if you check it into SVN like you should
  • If you move your development to a real dev environment, and leave the staging site as a staging site (that you can show clients, etc without worrying it broke in the last five minutes) this is good

Distributing Features

  • Are your features appropriate for drupal.org?
  • Is the configuration an IP issue?
  • How can I get that nifty update status thing behind the firewall?
  • If you can’t/don’t want to send it to drupal.org, but want to manage it internally over time: Features server
  • code.developmentseed.org/featureserver
  • Create projects, make new releases, subscribe to updates, etc
  • For automatic packaging, try the Project module
  • Feature server is much simpler, lets you get off the ground fast
  • Based on implicit standards: update status xml, exportables, drush make

Do It With Drupal: jQuery

Thursday, December 10th, 2009

See slides here.

  • What’s jQuery: javascript library, circumvents browser incompatibility
  • Known for things like opacity, AJAX requests that work across all browsers
  • Visual effects and “wow factor”
  • Reducing javascript code: getting elements by class name – 15 lines of javascript, 1-line snippet of jQuery $(“.classname”);

Selectors

  • jQuery selectors: CSS selectors by class (.), by ID (#), child elements/multiple elements ($(‘.sidebar a, .content a’);), CSS3 select by attribute: $(‘input[type=text]‘); also regular expressions on properties – see docs.jquery.com/Selectors
  • Effects

    • $(‘h1′).hide(); – hides all h1’s
    • can also do .hide(’slow’); or .hide(3000) – milliseconds
    • .fadeOut(’slow’), .slideUp(’slow’) – more at docs.jquery.com/Effects

    Events

    • Trigger an action in response to user (click, change, toggle, hover)
    • Toggle: two function names, the first time you click it does A, and the second time you click it does B
    • You can do it inline (not suggested)
    • return false is useful if you want the hand icon to show up to click, but not take the person anywhere (ie AJAX toggle; degrades gracefully if javascript is turned off)
    • Declared: gives a name to the function
    • Take all your PHP knowledge, apply it to Javascript really easily – but there’s lots of things that Javascript can do that PHP can’t

    Libraries and modules

    • Lots and lots of “wrapper” modules – so many nice jQuery plugins (equivalent of Drupal modules); make a module for every jQuery plugin (jCarousel, Lightbox, jqModal, Juitter, hoverIntent etc.) – there’s a whole module for just one function
    • People want to have dependencies – jCarouselViews module relies on jCarousel, even though there’s only one function there
    • You can end up with multiple copies of jQuery plugins all over the place
    • “We need an API to handle all these javascript libraries!” – so now we have wrapper-wrapper modules! (Plugins, JQP, jQ, jQuery UI the module)
    • Competing ways of adding multiple javascript libraries – different modules require different things
    • jQuery UI isn’t as bad as the other ones – so many modules need it
    • In short, some modules have great implementation (jCarouselViews); a lot more modules do things in a totally unnecessary way – not worth having a module if it only does one thing and you only need it in one place

    Adding JS in a theme

    • “It’s not hacking, it’s theming!”
    • Open up info file, add some scripts (scripts[] = utilities.js)
    • Name is always relative to the root of your theme
    • After you do that, clear Drupal caches

    drupal_add_js()

    • What if you only need it in certain places?
    • Put this function in template.php; same thing that the .info file is doing for you
    • Don’t just put it in page.tpl.php
    • Can aggregate and compress your CSS files – a lot less work for the web server
    • Javascript can be all aggregated together
    • Hosting your javascript elsewhere – well, it’s only 15k, it’ll slow down performance, it’ll update to the latest version (but this might break your site
    • If you add any javascript using drupal_add_js(), it’ll add jquery and drupal.js – provides some extra functionality, utility functions, global variables
    • drupal_add_js($data, $type); – $data is path to js, $type = ‘module’ (default) or ‘theme’; theme stuff always goes last because it’s assumed to be more important
    • Pass PHP variables to JavaScript: drupal_add_js($data, ’setting’), where $data is an array of strings
    • $(document).ready() – execute js when the page has finished loading
    • Prevents content hang-ups, degrade gracefully; replaces body(onload)

    References

    • visualjquery.com – whole library referenced on a single page
    • docs.jquery.com – official documentation
    • Drupal 6 uses 1.2.6 – there was an API change, and Drupal doesn’t break APIs between major versions

    jGarland

    • Can create a base theme in your info file, uses the stuff from the base module when you haven’t written something custom
    • Adding a bunch of javascript (jquery.countdown.js, cufon-yui.js – the new, hot thing in the world of fonts)
    • Cufon is awesome
    • Behaviors works on AJAX requests as well as page load

    Do It With Drupal: Geolocation

    Thursday, December 10th, 2009
    • You can use full html to input a map straight from Google
    • Geo, Geocode, OpenLayers modules
    • Standards compliance
    • Example functions: within, touches, crosses – also, distance, area, perimeter, and others!
    • PostGIS is a common way of doing it, but you can also “do it in Drupal”
    • User friendliness: you shouldn’t have to be a cartographer!
    • “Geo-spacial data sets”, shape files, projections, coordinates – this stuff can get messy fast

    Collecting data

    • Run everything through CCK
    • Geo Field
    • Geocode

    Lines

    • Can import a line (big mess of points)
    • OpenLayers – takes your data and turns it into awesome, pretty, rendered maps

    Data

    • Stuff you can geocode: image fields (exif), file fields (gpx track logs)
    • Demo site: http://geoblog.geojune.org/
    • Collecting tracks: dedicated GPS, gaiagps.com (iPhone), emacberry.com/gpslogger.html (Blackberry)
    • Data sources: data.gov, zillow.com, DataFinder, etc.

    Questions

    • Compatibility with other modules? – A lot of work is monolithic, people don’t work together; there may be a CCK thing for Mapstraction
    • GMapEasy module? No idea…
    • Before, people were doing Location, but that was overwhelming; then people did a simple lat/lon field; the views code was starting to get messy
    • You shouldn’t write a module that has to hunt around and look for things – try to get all the people who work on geo stuff to express their data in a consistent way (rather than making a module check all the possible ways to store data)
    • How many ways can you pull in data? – Geocode module is open, just tell it what kind of data you want to give it (“I’m going to give you an IP address”); no support for adding Geo stuff to users, but it’s an easy API change
    • In Drupal 7, when you can add CCK fields to users, the geo field becomes available
    • Any way to define/create your own graphics? Mapping Middle Earth? – No technical limitation, Geo module doesn’t care what you put in there; Geo just stores points/lines/polygons/3D, etc.
    • OpenLayers – finding graphical tile sets for your map of Middle Earth, that’s a whole secondary technical problem, even if you’ve got the right coordinates in there; there’s pieces on the display side that need to be there
    • There’s a filter for “less than/equal to X miles of this point” in Views – this saves you from math

    Modules

    • Geo – storing your data
    • Gathering your data – geocode, postal
    • Showing it – OpenLayers

    Do It With Drupal: Fantasy Sites- Stack Overflow

    Wednesday, December 9th, 2009

    About Stack Overflow

    • Zero barriers to entry
    • Reward good content by putting the best answers first
    • Give people karma
    • Destroy Experts’ Exchange and answers behind a paywall
    • Incredibly active, has sister sites superuser, serverfault – people collaboratively build great answers to pressing questions
    • Spawning clones – can license software behind Stack Overflow
    • “I could do that” tinyurl.com/bitquabit-so, tinyurl.com/mythical-weekend – this ignores how much “soft work” went into it, how the community would work, etc.
    • 24 hours of actual site-building behind this

    Behind the site

    • You’ve got questions, people have answers, people can vote up/down, people can favorite, community moderation, collaborative editing – every question/answer can turn into a wiki page so people can edit/improve/tweak/correct content
    • Lots of views of content – tagging, rich user profiles, badges, “karma crap” to get people hooked on contributing
    • Mapping out architecture of site and how things are presented:
      • Current active list of questions – shows you votes, answers, how many views, tags
      • Can sort by “karma bounty” (give up 100 points of my karma to person with best answer)
      • Can sort by hot questions, current week, current month
      • Newest, featured, highest vote-earning
      • Tag cloud view of entire site
      • View of all users and activity level
      • Badges: all the different awards people have earned
      • View of unanswered questions
    • “Ask a question” form
    • Moderation tools – editing and flagging, and post an answer right below the question
    • A lot of rich functionality, but totally dedicated to its core goal of Q&A

    Drupal version

    • www.array-shift.com
    • Can flag taxonomy tags that are interesting, and just see related questions
    • Node add form: done some work to streamline
    • BU Editor plugin – not WYSIWYG, but a tag helper – it puts the tags there, provides buttons
    • Uses markdown, not HTML
    • Markdown module in Drupal just provides an input filter
    • BU Editor plugin, Markdown manager, Markdown = rough analog to Stack Overflow
    • New module “Active Tags” – lets you accumulate tags as little flagged items rather than having them be in a list; just click a little X and it goes away – pure client-side stuff, nicer way of presenting the tag lists
    • Turned it on, added some extra CSS to put nice boxes around it, that’s it
    • “Wikify” module lets you invert normal node access – like “Private” (checkbox for ‘only people in specific codes can see this’), “Wikify” has the same thing, but the checkbox is for editing
    • “Flag” module used for star, “user points” module awards karma points when something is starred
    • (array-shift.com has major CSS problems in Safari)
    • 100 lines of code that intercept Drupal events and ward karma; could use “rules” module but it was easier to just do what I needed for this exercise
    • “user points” automatically assigns roles when people pass karma thresholds
    • “flag” is a “flag” module … flag that lets you set up arbitrary toggle-able flags for things, even supports “when more than 10 people flag something, do X”
    • Flagging something as offensive takes karma away (that’s why there’s a confirmation page to avoid mistakes), 10 offensive flags unpublish questions
    • Module called “flag term” – taxonomy terms, that’s how you track topics you care about
    • Pure theming differentiates word “flag” and image of a star
    • List of answers uses “node comments” module – stands in for built-in comments, has a content type “comment”
    • Can have a view that shows the comments, where the arrangement is based on the rating
    • Comments on comments wasn’t implemented (Stack Overflow has in-line meta-discussion, we didn’t have time to do this)
    • Node Comment lets you use normal Drupal comments on things too
    • User badges module exists for Drupal, but doesn’t have enough API support to configure without a lot of work (Anyone want to rewrite User Badges from scratch?)
    • Tabs for “newest”, “hot”, “etc” – each of these is a display on one view, set up with tabs
    • Tags view is a view of taxonomy tags – sort by popularity/name
    • Users – uses Gravatar module to pull in global avatar; set up user pictures like normal for Drupal, but Gravatar sits in the middle; generates unique geometric icons if you don’t specify your own picture
    • “Type to find users” – exposed filter
    • Node Form Settings module – lets you do things like hide the revisions field, hide name of the title – exercise control over chunks of the node form
    • Similar By Terms module used for “similar questions” – other questions tagged with the same tags

    Stepping back for a moment…

    • This thing has Q&A, voting (vote up/down module), karma (user points module), moderation (simple tools for flagging), ability to track interesting taxonomy terms (flag module), community editing (wikify – people with permissions can say “this is an article everyone should update to consolidate discussion”), a bunch of views that slice and dice content
    • Drupal didn’t need a lot to do that basic functionality
    • What it doesn’t have: meta-comments (this could be done, just didn’t have time), in-line editing and AJAXy goodness (when you hit “edit”, you go to the Drupal edit page; in Stack Overflow the body turns editable), karma bounties (could be written), user badges/awards (user badges module is pretty rough, doesn’t work so well beyond use case), user profiles are unthemed (Stack Overflow can pull in OpenID profiles, what you’ve voted on, chart of karma history, etc – there’s tools for each of that; user points history module that will generate chart; views attach module – stack views onto user profile)
    • Lots of polish isn’t there
    • Queued messages – if you’re not on the site when you’ve earned a new badge, it’ll prepare a message for you when you come back (maybe Activity module?)
    • TOTALLY missing: performance tuning, no community around it (Stack Overflow has a great community – that’s just as much work, and you can’t install it)
    • Doesn’t have a theme that can be distributed. Ever.

    Under the hood

    • 20 contrib modules, 2 custom modules (one is just exported flags/views), 1 theme, lots of config work
    • 6 views with sub-tabs
    • 5 flags (just set up in flag modules user interface: favorite, wikify, offensive, interesting/ignored for taxonomy terms)
    • 3 behaviors: posting/editing, evaluating, filtering, suite of modules – vote up/down, voting API, user points; active tags, BU editor, markdown, markdown editor, node form settings
    • Did not theme node form – CSS + those modules
    • pathauto, token – clean URLS
    • CCK isn’t even installed
    • 2 custom modules: export (held exported versions of those views and flags), tweaks (intercepting voting API/posting hooks to give karma)- could’ve used Rules module to do this
    • Theme has page templates
    • Views have unformatted view, row-style template (has title, number of votes, times node has been viewed, listing of tags) – give me an array, I’ll write markup
    • Theming views was easy
    • Custom node templates for question and answer nodes to position things correctly
    • Flag module templates used to override things and get the little star
    • Pre-process hooks to pull in user karma points, but no crazy theming hacks
    • 30 lines of PHP in a template file
    • No overridden theme functions (user name, breadcrumbs, none of that)
    • Extra credit: just learned about a module called “inline registration” – if user is anonymous, can enter desired username and e-mail at the top of the node form; when they submit, it’ll create user account, node, and assign node to user account in one step
    • Live preview of node editing can be done with “Live” module (used at groups.drupal.org)
    • blittr vs arrayshift
      • Analysis and evaluation of what the sites do, and what modules there are – spent more time on Array Shift doing that evaluation, bigger site, more ways of looking at that data than what Twitter provides
      • 10 hours on that stuff
      • Configuring the site – about 4 hours of going through and clicking on stuff in Twitter site, 3 hours for Array Shift
      • After analyzing what Stack Overflow looks like, how it works – implementing it in Drupal with user points, flags, views was pretty straightforward
      • Building the views took less time than doing it for the Twitter module – very clean mapping between the tabs and what they connect to
      • UX mapped itself well to building some views
      • Time for the more complex parts (custom code) kept going up for Twitter clone (7-ish hours) – kept going down for Stack Overflow (2 hours)
      • Theming – tricky any way you look at it; 9-ish hours for Twitter, 13-ish for Stack Overflow
      • Translating the info Drupal provides into the right markup for the theme
      • This time is based on starting from a design in HTML + CSS (coming up with an idea would take a lot longer)
    • “Magic” category – took 11+ hours, creating the whole install profile
    • Install profiles – use Aegir to install a copy of this on a sub-domain
    • Writing the install profile took as long as writing the theme – now can spawn infinite copies of this website
    • drupal.org/project/arrayshift – last pieces will be in place soon
    • BUT there will not be a pretty theme with it

    Do It With Drupal: New York Senate

    Wednesday, December 9th, 2009

    Background

    • Transforming an anachronistic organization with Drupal
    • In control of Republican party for 44 years
    • Never had a CIO before January 2009 – focused on internal enterprise IT before
    • People were cutting out and pasting articles from papers, scanning them, printing them, and distributing these reams of paper to offices every day – 1.5 million/year
    • CRM (constituent relationship management) – command-line type system
    • Intranet 1.0 – publishing info, no collaboration
    • Desktop PCs
    • Email 1.0 – intranet only, can’t work from home
    • Managing our own data center – not a core competency, but we do a reasonable job

    NYSenate CIO Mission

    • Transparency
    • Efficiency – more effective, less cost
    • Participation – give people a participatory role in government
    • Modeling ‘best tech practices’ for legislative bodies
    • Organize/share data internally/externally, improve internal/external communications

    Site dissection

    • No staff with web development experience in January; started out w/ consulting firm
    • Built by April, launched in May
    • Had to train hundreds of staff people to use it as content creators
    • RSS feeds, Twitter, Facebook
    • Popular/e-mailed/commented content, events, press releases/blogs/news clips
    • Almost 100 sites in one: 62 mini-sites for senators, 40-ish mini-sites for committees, issues/initiatives, legislation, open senate, about, photos & videos, newsroom
    • Previously, used proprietary CMS and external vendor – one party got better sites than the other, even with tax payer dollars covering everything
    • Senator directory – shows RSS/Twitter/Facebook (when available – been actively promoting this)
    • Senator pages: they stand on their own, all the info about the senator, he can post news releases/blog, news clips related to him, videos, RSS/Twitter/Facebook
    • Senators can create stories with visuals for their pages
    • Committees – each has its own stand-alone mini-site, with chairs, sign-up for newsletters, updates, video archive of meetings (will be live streams in January)
    • Submitting testimony on-line available in January
    • Issues & initiatives – marriage equality (aggregated all content from site), PSA (information about the census)
    • OpenLegislation: information should be freely available, searchable, sortable, permalinks
    • Open Senate initiative: OpenData (administrative info, how much who gets paid, what gets spent on what, etc.)
    • Data available in different formats – PDF, CSV, TXT, XLS, DOC
    • Contact forms for senators individually and for the site in general (press inquiry, webmaster)
    • Photos and videos – recording and, soon, livestreaming everything
    • Also available on YouTube; audio available on iTunes
    • Working on adding automated transcription
    • Blogger who works in the “newsroom” to create web-friendly content/press releases for the site

    Modules

    • 131 modules + core required: activism, petition, administration, gmap/location modules, content templates, interrelated date & calendar, imageAPI/imagecache, and more!
    • Views: home page image carousel, event calendars, video/photo galleries, press releases, petitions, senators’ pages
    • CCK: constituent stories, senate districts, events, expenditure reports, photos, polls, press releases, video, senator, committee)
    • 19 custom modules – custom views/blocks for the most part, permissioning system for Office and Web Editors
    • Upcoming: distributed authentication, ideas crowdsourcing, unified commenting
    • Working on implementing SOLR search – Acquia is now hosting our site as of today, we’ve so far been using native Drupal search
    • Embedded Media Field for video

    Integration with other applications, social web

    • 15,000 viewers on livestream.com for marriage equality debate
    • Social bookmarking for all content on the site
    • Some senators are using Facebook well and having open discussions with their constituents
    • nysenate.gov was re-branding, now we use “nysenate” for everything
    • API so developers can take any of our open data and do things with it
    • Haven’t made a final call about whether to keep using Discuss (external product) for commenting, or use Drupal’s native commenting (there’s a lot of configuring to do to get the seamless experience we want)
    • Sign up for updates about anything on the website; integrating w/ Bronto for e-mail blasts
    • Voting content up and down – needs to be elegant and incredibly easy, using a 3rd party solution right now and themed it like the main site

    Everything else

    • New hosting – don’t have the resources to host something like this; now moved to Acquia
    • New domain name – wanted .gov to force the issue of what you can/can’t say (previously, it’d been used to say partisan, sometimes nasty things)
    • New policies (content creation, copyright, privacy, TOS, release of data, permissions)
    • New processes (requirements gathering, quality assurance – people who had previously done phone service or legacy systems, content creation workflows)
    • New talent (previously didn’t have any web developers in-house, consulting contracts, staff)
    • New tools (videoconferencing, IRC Chat, Central Desktop- lightweight project management, Redmine- bug/feature tracking, ticketing tasks)
    • New training materials
    • New communications/PR

    Guidelines & miscellanea

    • No political or campaign information – conveniently, with .gov we’re not allowed to
    • Copyright policy – states can assert copyright if they want, but we went for CC BY-NC-ND for most things
    • Privacy policy – mirrored White House
    • Terms of participation – also mirrored White House
    • Post all code to Github
    • Use Daylife.com for replacement to paper clipping system
    • Hope that other legislative bodies will be able to reuse code
    • Had an Unconference (CapitolCamp) to hear what people think – some people were excited to pitch in, do things with API

    Questions & feedback

    • Node Bulk Operations could be helpful
    • Had to take screenshots for a while to allow very non-tech-savvy senior people to see private things without the risk of them doing anything wrong with it (finding a better way for this)
    • Feedback from senators has been all over the map – actually the inverse of expected, where more Republicans were early adopters even when they weren’t saying nice things about it in public
    • More Republicans were effective using Twitter and Facebook, more internally organized to identify opportunities and make the most of them collectively
    • Senators are learning that by making content easy for others to see and share, related content gets more views too
    • Google Analytics stats available for all senators available; special reports around particular events
    • 1.5 mil page views a month, on a big day, 50,000 unique views (marriage equality)
    • 40-50 comments on a hot bill
    • Not massive, shouldn’t cause major performance headaches, but we had to do this in such a rush that we have a lot of refactoring to do to make sure it holds up okay under stress
    • If there’s something broken, blogs publish screenshots – we have to be very vigilant
    • Want to make custom modules available; just haven’t had the bandwidth, just have a code drop on github for now
    • Building relationships with CIOs of various state agencies – some of them have a lot more developers
    • PDFs have been the traditional publication format, including scanned documents; we’ve maintained that format for most data to accommodate the “I want to download and print” crowd – only last week got wifi in capitol building
    • For born-digital content, making it available as feeds in ways that will make it easier for people to use
    • More and more federal work being done in Drupal (whitehouse.gov); a couple state entities have put up rudimentary sites (liquor authority for state of New York)
    • Contacted mostly about policy issues for other states – comment moderating, copyright
    • Big national open data initiatives – community of practice around government transparency
    • Haven’t sat down with whitehouse.gov Drupal developers to talk about roadmaps yet – we feel overwhelmedly busy right now
    • Third party to compare roadmaps, sort out implications for working together? It’s a major undertaking
    • Sunlight foundation – encourages getting data out in mashable form; they give us feedback
    • Some senators have gamed the system by getting people to e-mail things they post so it gets on the “most e-mailed” list – this upsets other senators

    @ahoppin
    @NYSenateCIO
    NYSenate.gov/department/cio
    Hoppin – at – Senate.State.NY.US

    Do It With Drupal: Anatomy of a Distribution: Open Atrium

    Wednesday, December 9th, 2009
    • Open Atrium is a “team portal in a box” (AKA Basecamp alternative)
    • Can be behind a firewall, is free, openatrium.com
    • Putting people in different groups
    • Comes with six features:
      • Blog: turned on/off on a group-by-group basis
      • Wiki
      • Calendar- iCal feeds too
      • Shoutbox – like private Twitter
      • Case Tracker – ticketing system
      • Group dashboard
    • 75,000 downloads since July 17
    • translate.openatrium.com – 31+ levels to various extents; get updates that don’t overwrite your custom updates

    What are people doing with it

    • Basic project management tool set
    • Sprite-based theme (5.5 kb, 13.7 kb)
    • Tailoring the system to your own needs
    • Drupal Core, modules, plus Features module power Open Atrium
    • People can customize their own dashboard
    • Cross-posting to different groups disabled; also, Organic Group configuration much more simple (clear distinction between public and private)

    Migrating into Open Atrium

    • It’s just a Drupal site, so in theory you can turn on the Open Atrium modules around your existing site (but this isn’t suggested) – use some other way (Feeds module?) to aggregate existing content and put it into the new framework
    • Migration is a solvable problem, but probably not in a generic way useful for the core project

    Extended features

    • Project status – time tracking and approval flow for a web shop
    • World Bank did a highly customzied version; integration with Lotus Notes – their own internet behind a firewall; faceted search across their pre-existing staff directory; extended events system to help with scheduling
    • Some custom coding went into the World Bank site, but a lot of what goes into it comes from configuring existing modules

    How we use it

    • Over 50% tickets
    • Use blog instead of e-mail for the most part

    Atrium’s rules

    • Works out of the box
    • At least as simple as running straight from drupal.org
    • Once you install it, it’s clear what the next step is – unlike Drupal, where you install it and wonder “what now?”
    • Works with Aegir
    • Doesn’t hack core or contrib (except occasionally- there’s a hack to Views that makes it translatable)
    • Doesn’t do everything – does a few things that are widely useful for intranets, and you can extend it

    Things we’ll never do

    • Add a WYSIWYG; BUT, you can do that
    • Add CVS integration (but see features.blackstormsstudios.com)
    • Add Alfressco integration – but someone else has tried this
    • Investing some time in Google Docs integration
    • Won’t ever clone Basecamp – but someone wrote a theme that looks a lot like it (drupal.org/project/atrium_simple)
    • Add Sharepoint integration to base package

    Things we will do

    • Clearer branding- Drupalisms & Atriumisms beware!
    • Drag and drop dashboards (vimeo.com/7643255)
    • Better admin experience (drupal.org/project/admin)
    • Pluggable search
    • Improved l10n support- Drupal only supports one language at a time, we want to fix this
    • Rewriting core functionality – upgrading to Context and Spaces, when we say “beta”, we mean it
    • Rework the “user space”
    • A calendar with a user story
    • Rewrite Case Tracker – this powers the to-do system, people want to customize the states cases can be in, kinds of cases, etc. (github.com/miccolis/casetracker)
    • This is going to be painful, we’ll provide upgrade paths
    • Move to drush make (drupal.org/project/drush_make)
    • New on drupal.org: install profiles: lists of things that, all together, make a site

    Do It With Drupal: The Economist

    Wednesday, December 9th, 2009

    Rob robpurdie@economist.com – Scrum Practice Leader
    twitter.com/robpurdie
    facebook.com/robpurdie

    Overview

    • Moving incrementally and iteratively to Drupal- making improvements as you move bit by bit
    • User comments and recommendations served from Drupal, along with comment history pages, article comments pages
    • Syncing data to Drupal every 5 minutes– all content and comments
    • Soon, article pages served from Drupal– running into a few performance problems
    • Next: channel pages served from Drupal, third-party services, registration
    • We benefit from Drupal sooner by taking this approach; rather than building the whole site in the background and not benefitting until the end, this way we benefit from improved functionality sooner
    • “The Economist is so old that the guy who started it had to be painted rather than photographed”

    The old way

    • 20-30 mil page views, 3-4 million unique visitors per month – lots of performance and scalability issues
    • Want to build the foremost destination online for analyzing and debating global agenda; want to bring visitors into that debate; current system isn’t enough to support this vision, that’s why they moved to Drupal particularly for comments
    • Increase publishing volume with user-generated content (more content w/o more costs)
    • The old way: custom CMS built on proprietary stack (MS, ColdFusion, Oracle)
    • Blogs were originally MovableType, now are all Drupal
    • Broken waterfall processes meant frequent fire-fighting
    • Needed to be more responsive to change, deliver business value sooner (projects take a long time to deliver value to organization), more sustainable, happier
    • Making these changes incrementally and iteratively; “perfect is the enemy of better”

    Why Drupal?

    • Looked at OpenCMS, Alfresco, Joombla, met with other newspapers, considered building a custom system, buying a proprietary system, or going open source
    • Drupal as strategic fit: community and content publishing, robust development framework, development language, free software
    • Strength of Drupal community
    • Selling Drupal internally was a challenge: no suit-wearing Drupal sales force
    • Attended DrupalCon Boston 2008, networking within community, engaging w/ Lullabot for workshops and training
    • Proof-of-concept to reproduce article page in Drupal; how to use CCK fields to make a rich article content type

    Using Scrum

    • 3 million registered users, articles – data migration is daunting
    • Manage the move using Scrum – selling it was easy with charts (developing business value sooner and throughout, management can see progress throughout, shining a spotlight on issues/dysfunction and attacking them along the way – risk decreases a lot faster)
    • Take requirements, prioritize based on business value: which are the most important to organization, do those first
    • Trained management team in Scrum, development team in Drupal, then started sprinting with help from consultants (2-week sprints, delivering something of value at the end)
    • “Maybe not the largest Drupal project, but the most expensive” – lots of consultants

    Integrating CMS’s

    • Proxy approach: Drupal sends JSON over HTTP back and forth with Existing ColdFusion system
    • Using native Drupal comments; comments have to be attached for nodes – there has to be a node for every piece of content on the legacy system
    • Create nodes on the fly for every ColdFusion request that comes in
    • Notion of proxy nodes is a pattern that comes up during integration of Drupal with other systems
    • Voting API votes used for recommends; these are also attached to proxy nodes
    • Started with proxy approach only; then moved to doing some with subdomain approach – hope to be doing neither soon after moving entirely to Drupal

    Migrating data

    • Migrating and syncing data every 5 minutes – don’t wait until the end to figure out that piece
    • Table Wizard and Migrate modules
    • Table Wizard writes Views integration for MySQL tables
    • Migrate lets you migrate certain views, push into Drupal as nodes/users/taxonomy terms/etc
    • Client is involved in how legacy data gets organized in Drupal
    • Sat down with client to browse through content and decide what data needs to be moved and what it means
    • Migrate keeps track of everything you’ve done, gives you a dashboard, tells you how far along you are – keeps a mapping table, legacy ID, you can check and see what came across and fix things; does your bookkeeping for you
    • Drupal expects to have all the info it needs in its database; something getting published in Oracle needs to be in Drupal promptly – synchronization

    Questions

    • How did you decide what to put into Drupal first?
      • Business value: comments, user profiles, recommends
    • How many Drupal servers does it take to scale that big?
      • Not entirely sure how many servers we have; let’s say +/- 12
      • Master MySQL server, a few slave MySQL servers – more important aspects have to do with Pressflow
      • Pressflow = high performance variant of Drupal 6, completely API compatible with Drupal, but it takes some patches that are in Drupal 7 and moves them in to Drupal 6
      • Use Varnish’s full capability; Varnish = reverse proxy server, takes load off Drupal/PHP/MySQL
    • How do you stop people from trying to shove their emergencies into Scrum process?
      • Don’t want people going directly to the team like they traditionally do
      • Team, Scrum Master, product owner – customer, person who represents the client, has to have power to make decisions on behalf of organization, responsible for managing stakeholders
      • Product owner comes to team w/ prioritized list of features for next sprint
      • Had two teams in New York and one team in London all doing 3-week iterations in parallel
      • Split up site into component parts: profiles, article pages, channel pages, had three product owners who had to manage stakeholders
      • Works reasonably well; now we’re doing two teams, one system that shows what all teams will do; someone has to keep “product backlog” in order, stopping people from shoving in their “one little thing”

    Features

    • Base theme is 960 px grid – laying out themes as a series of columns, all sections have to fit into the grid
    • Selenium for “user journey” testing; building environments to help manage configurations
    • Continuous integration using Hudson – needed a shared place where user tests could run
    • Set of servers running on Amazon; Hudson sets off user tests every time there’s a commit to the SVN repository
    • Apache SOLR search hosted by Acquia- 100,000k articles that have to be available through site search
    • People were unhappy with relevance of matches in old site search
    • Acquia’s hosted search service: really fast, good results
    • Apache SOLR: can start filtering results further and further – faceting
    • “How do I get SOLR running on my website?” – can self-host, but we went with Acquia

    Questions

    • Other tools for managing people/process?
      • In Scrum, less about resource management – we just want dedicated co-located teams, don’t worry about availability because of multiple projects: single focus
      • Redundancy of function – generalizing specialists, specialists can create bottlenecks/risks
      • “How many people need to be hit by a bus before your project fails?”
      • agilemanifesto.org
      • Use Google Docs a lot – project backlogs are all spreadsheets, a big wiki, project dashboards that “radiate information to the rest of the organization”
      • Focus is on people, not tools
      • Test-driven development, writing tests first can sometime be hard with Drupal

    Impediments to progress

    • Previous processes/structure/culture: command and control – hard habit to break
    • Project manager telling people what to do and when to do it by – this is bad management; it has an impact on people
    • We want self-organizing teams
    • Previously, black box development: low visibility during the project process
    • For Scrum, everything needs to be transparent, frequently inspect outcomes, adapt as we go – can’t have a postmortem after everything’s done, need to do that every day
    • Hero developers who go off and solve problems heroically aren’t compatible with Scrum
    • Previously, developmental silos – departments based on function, these have been removed, but people still want to exist within their old silos
    • People want to work on multiple projects like they used to, rather than working on a single project in a dedicated manner
    • Previously, traditional line management: where you stack up in the line doesn’t matter now, this was a big change
    • Engineering practices (specifically quality) – big issue; Scrum is a wrapper for your existing engineering practices, doesn’t say anything about testing
    • Scrum assumes your engineering practices are great, or you’ll make them great quickly
    • You can say “we’re going to do Scrum” but old habits die hard – focusing on what “done” means and providing a deliverable at the end of each sprint, have to deliver quality too– have to go live successfully
    • Want to deliver “potentially shippable code” at the end of each session – have to have a testing environment that’s representative of live environment; been bitten by differences in configuration
    • Everything has to be identical in the test environment (just with a scaled down number of servers) – same data center, same network issues, etc
    • Hard to bite the bullet on the costs involved in building a testing environment, but it’s important
    • Hard to simulate kinds of traffic you get in production – plus, have to keep track of session cookies
    • Form fields can hurt you – replaying post requests
    • Cron jobs that run all the time – cron jobs can stack up and site starts to decay

    Questions

    • Migration of real-time data: code changes are easier to migrate than content changes, what’s the process for moving bits of content from development to production?
      • When there’s content you need to work on for a while before it goes live, work on the live servers but make sure end-users can’t see it
      • Can use the unpublished flag on a Drupal node to do that; use “views” to see everything unpublished in sports category
      • For a small team, that’s a reasonable solution
      • For bigger organizations with a lot of people working together, use “Workflow” module – nodes step through a series of states
      • If it’s a business requirement that content has to start off on staging servers and only then push to live, use module “Deploy” – push-button way to push nodes and their dependencies– users, taxonomy terms, etc– to another environment
    • Technical reason for using external searching – why use SOLR at all? What about Drupal search?
      • Drupal 6 is better than previous search mechanisms, but falls apart at a certain scale
      • Slow queries, sub-optimal results
      • A lot of non-Drupal people have worked on Apache SOLR, Drupal has integrated it well
      • Self-hosting, or with Acquia – if you have the talent to run Java apps in your data center and keep it running, self-hosting is a great idea; will reduce latency
      • Most of us are struggling to keep PHP/MySQL up as it is, this is where Acquia comes in
      • Acquia service is pretty much plug-and-play
      • Built-in search doesn’t come with facets; can add on facets with the “Faceted Search” module
      • SOLR is an enterprise search system; used by Netflix, Expedia, etc.
    • Could you use Views instead of facets?
      • There’s a lot of overlap there, and different possible approaches.
      • Full-text searches need SOLR rather than Views
    • Some of the wins you’ve had with Scrum/Drupal, and some weaknesses
      • Wins by development teams – prefer this way of working, where business people are only concerned with relative priority of requirements, have no say in how long it takes to implement
      • Product owners prioritize “stories”, developers size those stories relative to each other, rather than in hours of effort
      • Stops the cycle of cutting corners on quality in order to get it done in a shorter timeframe
      • Can’t get productivity gains w/o changing the way you work
      • Product owners need to be involved, can’t change requirements mid-sprint
      • Have “working agreements” – a kind of social contract
      • Scrum isn’t a prescription – you can pick and choose the parts that you want that meet your organization’s needs
      • Specific processes layered on top of simple framework of transparency, working together, and adapting to testing results, can vary
    • When will the Economist be fully on Drupal?
      • Description says “this month” – that was the plan
      • People paying the bills get to make decisions; is it most important for us to go all-Drupal ASAP, or extend functionality of site to be competitive?
      • Recent decision was for the latter
      • Don’t know when