A bit more about Taxonomy and Recirculation

The taxonomy’s main job is to help users explore the site, and it can only do that if the tags and categories are used consistently. We wanted a system that was easy for editors to manage, with clear guidelines for how to apply tags and categories to posts. When we looked at the original taxonomy on The Toast, we found, well, the exact opposite of what we wanted.

The Old Toast: Like your closet, but with more piles of random stuff

Peeking inside The Toast uncovered a big ol’ stew of tags. When we started this process, The Toast had around 4500 posts sorted into 60 categories. We discovered 8,182 unique tags—6,152 of which were applied only to a single post. More than 400 tags weren’t used at all.

The complete tags list, as you can imagine, was a hot mess. We audited the tags and found they were being used in three distinct ways.

Some tags were topical:

Some grouped posts into recurring series:

Some were funny:

The tags list also (unnecessarily) included writer bylines, thanks to a WordPress plug-in that duplicated these names.

The category list was a bit cleaner—all of them were intended to be functional rather than funny—but most category terms also existed in the tags list, and it wasn’t clear why an editor would choose one over the other.

The New Toast: Your closet after a visit to the Container Store

We identified five distinct ways that posts could be sorted, each with its own purpose and rules:

Categories and tags have no overlap. A term can exist in one list or the other, but not both. Categories are the primary grouping of the post, and the terms are quite broad: “Food”, “Race”, “Art”. If taxonomy is branding, then these top-level categories convey the major themes that make up The Toast. The category is relatively prominent in the homepage and article page metadata.

Tags are much more topical. Topical tags are displayed on the front-end, but their real purpose is to drive the “recirc” modules that help users explore the site. (More on that in a second.) Keeping these tags functional means that we can automagically show more posts about “Buddhism” or “Shakespeare” as long as everything is tagged consistently.

Fake tags are actually fake. The funny tags (like truckin’ and the continuation thereof) are a vital and hilarious part of The Toast experience, but the little information architect in our hearts wept whenever a user clicked through to the archive page for one of the 6,152 tags that only had a single post in them. On the new site, those “tags” are still presented on the front-end, but on the back-end they’re just a plain text field. (The fake tags link out to a Google Search, which we think is hilarious. We’re fun at parties.) We kept the funny and the functional, but gave them each their own field so they could be used differently. Deep breaths, taxonomists. It’s all going to be OK.

Series have their own taxonomy list. The series are a major draw, and a huge source of multiple page views – it’s hard to read something like Mallory’s Two Monks Inventing Bestiaries and not immediately want more in the same vein. The next thing a user wants to see is probably not another article related to “animals”, but more inventions from the monks: perhaps maps, or dinner parties. By separating the series into their own taxonomy–rather than grouping them under Tags or Categories—we were able to build recirc modules that give preference to series-siblings over topic-siblings.

Authors are managed only as people, not tags. Wordpress has a built-in way to create authors, with a byline and a gravatar. But the old taxonomy included many author names as tags, too—this was unnecessary, and we are all about avoiding unnecessary work. In the new system it’ll be easy to see more articles by a given author, so you can catch up on the back catalog written by your favorites.

Migration: Like cleaning out your closet, but with more robots

Migrating from the old taxonomy to our new and shiny five-part taxonomy required some human effort—with a lot of help from automated scripts. It wasn’t feasible to manually re-tag every single post and launch a new site during this century. But the new taxonomy wouldn’t work unless the existing posts were converted to the new system.

We started by exporting the full list of all tags on the site to a spreadsheet. We sorted and grouped by the number of posts in each entry, so tags with more than five posts could be handled first, leaving tags with only one post for later. We tasked Nicole and Mallory with recategorizing the list, which was the best client trolling we’ve ever done. They sorted each entry into:

This sorting of tags into their new buckets could only be done by real people who were familiar with all the content in question, then the work after that could be automated. We also hope that this process will prophylactically prevent their tags from getting out of control in the future.

Once Nicole’s endless toil was complete, Eaton coded us up some scripts. The database structures around taxonomy in WordPress are blessedly simple, and he was able to write SQL scripts to do the actual conversion of one type to the other. This process went remarkably smoothly and required only a tiny bit of futzing once it was complete.

There are still lots of opportunities for cleanup in the tags! “Drugs” can merge with “addiction”! “Othello” and “king lear” and “as you like it” can move into “shakespeare”! It’s an ongoing process. Fortunately, the tag management interface handles this merging through a simple GUI and so The Toast editors won’t need to rely on spreadsheets and SQL in the future. The Wordpress plugin Term Management Tools also allows editors to convert terms from one taxonomy to another. In the future, if a particular tag grows large enough (the current bar is about 75 entries), it can be promoted to Category status. Editors can also convert a set of posts to a Series quite easily.

Recircs: Your closet if it went on forever, like to Narnia

Eagle-eyed Toast readers might notice there’s no top-level navigation that segments content into “channels” or “sections.” For a blog-style site with very frequent posting, a top-level nav doesn’t provide much value to readers. Rather than putting our energy into creating (and arguing over) the categories for a navigation bar, we put our energy into a bottom-up system to support related content.

Our goal for this redesign was that readers would easily be able to waste enjoy a whole afternoon browsing The Toast. When you finish a post or scan the main feed, we want to offer you more interesting or relevant posts, so that you keep clicking, always clicking, ignoring your needs for food, sleep, and human interaction. Publishers call this “recirculation” and we refer to the related content modules as “recircs.” Coming up with the rules for when to show which type of list—more from this author, or in this category, or this series—was an exercise at the intersection of information architecture idealism and the gritty reality of the CMS’s capabilities.

Popularity recircs

The two most straightforward recircs, Most Popular and Most Active, appear on the homepage and on article pages. As you can probably guess from the names, these are based on popularity rather than taxonomy.

Active Now: The Toast attracts the best commenters on the web. The comments on most sites are a cesspool, but Toast readers are a witty and wise community. We want to highlight posts with recent comments, so we show the six posts that have been commented on most recently. (We also highlight very active posts in the article metadata with color and animation.)

Most Popular: This shows the six most popular posts from the past week, but defined by number of comments rather than by number of views. Why such contradictory logic? Apparently counting the number of hits on an article is The Hardest Problem in Computer Science, WordPress edition. We protect site performance by basing this on comments. (There’s no duplication between these two recircs, so an active article won’t show up in popular.)

Taxonomy recircs

The recircs that draw from the taxonomy rely on a bit more logic. In the main feed we show recircs attached to a recent post. The robots scan down the feed, looking for a story that matches:

If we find a match we skip down a few notches, then try again. If we don’t we try on the next post. We also use this same logic to show related content at the bottom of article pages. (If this isn’t nearly enough information on the logic, Eaton explains how we make this happen in his post on the Wordpress back-end.)

More in Series: The logic tests for this first because “articles within a series” is the strongest signal we have for related-ness. A series must contain at least five posts, so if we match on a series we should always be able to fill up a recirc with 100% related content goodness.

More by Author: We want to promote writers who contribute regularly. At the end of article pages, we always show a “more by author” block if we can match four previous posts. (Clicking the author name will show that author’s entire archive—even if it’s only one entry.) In the main feed, we also try to match four previous posts. You might notice that Mallory, Nicole, and Nikki appear rather often in the main feed recircs—we tried to suppress Toast editors but found it was too much of a drag on performance.

More from Tags: Remember how we made Nicole and Mallory manually sort all their tags? Guess we’d better do something with all those tags to promote related content. Each story in the block of four matches one or more tags with the current entry. The tags match independently, so not every post in the recirc matches the same tag. Why don’t we tell you which tags we’re matching? The robots are vexingly slow when we do that, so please enjoy these delightfully relevant or bewilderingly head-scratching related articles!

More in Category: As a last resort, if we can’t match against series, author, or tag, we’ll show more posts within the top level category. Because categories are broader than tags, we will always have a match here. We suppress the “Meta” category from ever appearing in recircs, otherwise we’d show way too many links to old Link Roundups and Open Threads.

Finally, to avoid having the same story show up in multiple places, we built in some logic to ensure we don’t match articles that are already on the page. So every position on the page should show a different story—some recent, some popular, some active, and some related.

Phew! That’s a lot of effort and logic put into the taxonomy and recirc modules, but we think it was worth it. The content on The Toast is top-notch, and making it easier for users to find more of the pieces they love was one of our primary goals for the project. Taxonomy can be used to create amazing user experiences, but the groupings and terms have to be planned and maintained by a team of smart humans in order for the code to have a solid data foundation to pull from.