Business Reporting: Folksonomies v. taxonomy

folksonomies + controlled vocabularies

Posted by Clay Shirky

There’s a post by Louis Rosenfeld on the downsides of folksonomies, and speculation about what might happen if they are paired with controlled vocabularies.

…it’s easy to say that the social networkers have figured out what the librarians haven’t: a way to make metadata work in widely distributed and heretofore disconnected content collections.

Easy, but wrong: folksonomies are clearly compelling, supporting a serendipitous form of browsing that can be quite useful. But they don’t support searching and other types of browsing nearly as well as tags from controlled vocabularies applied by professionals. Folksonomies aren’t likely to organically arrive at preferred terms for concepts, or even evolve synonymous clusters. They’re highly unlikely to develop beyond flat lists and accrue the broader and narrower term relationships that we see in thesauri.

I also wonder how well Flickr, del.icio.us, and other folksonomy-dependent sites will scale as content volume gets out of hand.

This is another one of those Wikipedia cases — the only thing Rosenfeld is saying that’s actually wrong is that ‘lack of development’ bit — del.icio.us is less than a year old and spawning novel work like crazy, so predicting that the thing has run out of steam when people are still freaking out about Flickr seems like a fatally premature prediction.

The bigger problem with Rosenfeld’s analysis is its TOTAL LACK OF ECONOMIC SENSE. We need a word for the class of comparisons that assumes that the status quo is cost-free, so that all new work, when it can be shown to have disadvantages to the status quo, is also assumed to be inferior to the status quo.

The advantage of folksonomies isn’t that they’re better than controlled vocabularies, it’s that they’re better than nothing, because controlled vocabularies are not extensible to the majority of cases where tagging is needed. Building, maintaining, and enforcing a controlled vocabulary is, relative to folksonomies, enormously expensive, both in the development time, and in the cost to the user, especailly the amateur user, in using the system.

Furthermore, users pollute controlled vocabularies, either because they misapply the words, or stretch them to uses the designers never imagined, or because the designers say “Oh, let’s throw in an ‘Other’ category, as a fail-safe” which then balloons so far out of control that most of what gets filed gets filed in the junk drawer. Usenet blew up in exactly this fashion, where the 7 top-level controlled categories were extended to include an 8th, the ‘alt.’ hierarchy, which exploded and came to dwarf the entire, sanctioned corpus of groups.

The cost of finding your way through 60K photos tagged ‘summer’, when you can use other latent characteristics like ‘who posted it?’ and ‘when did they post it?’, is nothing compared to the cost of trying to design a controlled vocabulary and then force users to apply it evenly and universally.

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture.

This is exactly, eerily, as stupid as graphic designers thinking in the late 90s that all users would want professional but personalized designs for their websites, a fallacy I was calling “Self-actualization by font.” Then the weblog came along and showed us that most design questions agonized over by the pros are moot for most users.

Any comparison of the advantages of folksonomies vs. other, more rigorous forms of categorization that doesn’t consider the cost to create, maintain, use and enforce the added rigor will miss the actual factors affecting the spread of folksonomies. Where the internet is concerned, betting against ease of use, conceptual simplicity, and maximal user participation, has always been a bad idea.

Comments (12) + TrackBacks (0) | Category: social software

COMMENTS

1. Simon Willison on January 7, 2005 06:26 PM writes...

Further to your points about, I think a key element of folksonomies that is yet to be fully explored is ways of improving their support for "emergent" vocabularies.

Here's an example: I'm posting a picture of a squirrel on flickr; do I tag it with "squirrel" or "squirrels" for best effect? I can find out which term will be most effective by seeing how many pictures are already tagged with those two terms respectively, and going with the most popular.

At the moment that's a slightly tedious manual process, and one that many people are unlikely to bother with - but if the software offered a seamless interface for doing that (a Google Suggest style popup showing how many images are tagged with that tag as you type for example) people would be far more likely to form and follow a consensus.

I'm confident that there are a lot of things that can be done to improve the quality of folksonomy-produced metadata, without increasing the price (and rendering them useless).

Permalink to Comment

2. Lou Rosenfeld on January 7, 2005 07:29 PM writes...

Clay, interesting comments, but you seem to have missed my point. True, I shared my concerns about folksonomies; I expect you'd agree that they're no panacaea. Nothing is. It'd be silly not to be skeptical about them at this early point in their development.

But I'm also quite skeptical about controlled vocabularies. I've probably read all the same studies you have--perhaps more--detailing their high cost. I spent four years in an LIS program and worked in libraries, so I have a little first-hand knowledge. Oddly, people who attend my IA seminar walk away with the sense that I'm against controlled vocabularies. So shoot, Clay, we actually agree on this point.

But how these two forms of metadata might work together is what's really exciting. (And that's why I used the holistic term "Metadata Ecologies" in my posting's title.) They may be quite complementary, which is wonderful, as salvation lies in neither. I hope we might begin brainstorming how they can work together.

We're not even bringing up how the nature of content, users, and context plays out in all this. Folksonomies might work fine for archives of photos. But I'd prefer that my doctor rely on professional indexing to do his research the next time I'm in urgent care with some strange condition. And I'm hopeful that down the road a medical folksonomy might somehow improve on the performance of MESH headings, thereby increasing my chances of survival.

In the meantime, is there anything else you'd like me to convey to the "‘well-designed metadata’ crowd" at our next meeting (every second Tuesday at the south entrance to Dewey's mausoleum; be there or be uncontrolled)?

Permalink to Comment

3. Jay Fienberg on January 7, 2005 08:23 PM writes...

I'm glad you connected the folksonomy issue to the Wikipedia one, because I think they're similar stories in terms of the battles of loose vs controlled ways of doing things, and how folks who like one or the other tend to react to the other's approach.

But, I think this story of the loose vs controlled battles, however one would tag the two sides, is one that folks like Lou don't fit into so neatly, and that you over reacted to his points.

I think the implication is wrong that folks who practice information architecture automatically fall into some kind of controlled vocabulary metadata control freak category who opposed all wiki folksonomy tag flipsters.

Likewise, I think the implication is wrong that all ordinary folk are, by nature, free tag lovers who'd only desire controlled vocabularies if it got them out of a deal with the devil.

As Lou suggests, there is a whole interesting realm of possibilities wherein both of these approaches are combined and/or co-exist. Even Wikipedia has forms of control--loose vs control is co-existing there.

And, Flickr / del.icio.us have controls in terms of how one can change tags, once they are created--which are controlled vocabulary techniques that (maybe) could actually be removed, IMO, were those folks really committed to folksonomies!

Personally, the most interesting thing to me is creating ways to allow the one approach to evolve into the other, and vice versa, as IMHO, the "right" way is one that can evolve either way, dynamically (e.g., things can be under organized or over organized, and good organization is a dynamic balance between the two).

Permalink to Comment

4. Dave Evans on January 7, 2005 09:38 PM writes...

I think some meta-data will be more controlled that others. Business environment stuff, like "bought by", "owned by", "works for", "funded by", which are the types of tags I'm using in my vizualisation system, are pretty easy to standardize. Tagging "squirrel" is probably good enough for most people without having to worry about plural forms, or black or red squirrels. I wonder if there is a way to self-organize tags against the most popular ones that emerge over time? Changing tags in one fell swoop like in Flickr might be a (scary) good thing, like upgrading software for new features.

Permalink to Comment

5. Rick Thomas on January 7, 2005 09:53 PM writes...

This is a microcosm of the process of language formation. For matters of consensual reality language is fairly fixed. When there's something new to talk about language is fluid and then converges as the subject is understood. The resulting language will always vary by community - English vs. Russian, engineers vs. marketers - because they have different experiences. Bridging communities depends on multi-lingual people using clever tools.

This is also why it's easy for a million bloggers to write quick opinions, but relatively harder to synthesize collaborative works - there is an unavoidable cost of semantic reconciliation.

Evolution uses this algorithm to create life. Start with any found stability. Produce diversity. Choose better stability. Create highly conserved systems along the way.

Permalink to Comment

6. Shannon Clark on January 8, 2005 12:16 AM writes...

It seems to me that there is another, very significent and high "cost" to controlled vocabularies - except in a very few cases, users have to learn (and/or navigate/use other tools) the vocabulary to use it, let alone use it effectively.

i.e. take an extreme example of a library shelving system - it is not at all trivial or obvious to most users (let alone professionals) where a given book "should" be shelved and I assume the process of integrating new/emergent categories is a decidedly non-trivial one. A library shelving system also shows one of the major flaws of many formal metadata systems for many users - they assume an either/or system - i.e. a book can only be in one place at a time, so it is either in one category or another, but not both (at least not physically).

Online there are countless cases when a user, very logically, wants something to be multiplely tagged - i.e. it is both a book business and a book on technology, it is a photo myself as well as a photo containing a monkey etc.

It is also useful to keep in mind why, where, how and for whom users apply metadata (in non-formal situations). Most of the time in most systems users apply none or very little metadata. It is only when doing so ads value very directly for the user that users generally speaking take the time to add metadata.

- blog posts might get metadata if someone wants to make it easier for they themselves to find their own posts. And/or if they have enough readers to assist those readers in finding related posts

- photos may get tagged if someone wants to make it easier for their friends to find specific photos, as well at times to open up photos (ala Flickr) to a wider audience, such as other attendees of the same event.

These fairly adhoc, mostly relatively limited in scope uses of metadata differ very widely from the more formal uses imagined by many people - such as the "Semantic Web" crowd etc. In those cases the assumption is that metadata (and extensive formal metadata at that) is to some degree inherently valuable and useful - but also that it will enable a new class of applications and uses.

I would argue that most of the time the cost of doing all of this tagging, especially the cost of learning the system for tagging (which is more than just learning the names of the tags - it is also learning how to pick and choose between tags, how to search for the "right" tag(s) etc) is vastly higher than most people (or their companies that pay for their time if done in a professional environment) are willing to incur.

Potentially some tools can be built to automate the process - to suggest tags, to apply many of them in a mostly painless and automated way - though all such systems have to guard against inaccurcy as well as the "other" category problem Clay highlights.

In short - an important topic for discussion and one where I pretty much agree with Clay.

Shannon

Permalink to Comment

7. Bill Seitz on January 8, 2005 11:06 AM writes...

I wonder whether folksonomies will just turn into free-text search engines? That's the other extreme of the uncontrolled-vocabulary spectrum...

Permalink to Comment

8. Bill Seitz on January 8, 2005 11:12 AM writes...

Specifying which *contexts* are being discussed seems awfully relevant for discussions like this.

The more coherent (non-diverse?) the "user" "community", the more easily a SharedLanguage can emergence and be maintained...

http://webseitz.fluxent.com/wiki/SharedLanguage

9. pb on January 8, 2005 06:53 PM writes...

Check out this outlandishly clueless call for a "well-designed" web:
http://www.opendemocracy.net/debates/article-8-10-2277.jsp

Not only are all of Thompson's complaints completely wrong, they are the key drivers of the web's crazy success!

Permalink to Comment

10. Edward Vielmetti on January 9, 2005 02:27 AM writes...

This discussion reminds me of the James Fallows NY Times piece on knowledge management where he distinguishes between the "big heap of laundry" approach (= folksonomy) and the "neatly folded PJs" (= taxonomy) approach to handling volumes of information.

Given how much attention people pay to presentation when it comes to materials that they expect to have a big impact or a long lifetime, I can only expect that we'll continue to see both systems in place, sometimes in parallel, as long as there are exclusive categories (Michelin 4-star restaurants) where common-folks opinions aren't the point.

12. bborn on January 21, 2005 10:12 AM writes...

What if the descriptive taxonomy (what this thing is) was open-ended (a folksonomy), but the functional taxonomy (what would you do with this thing) was controlled?

So, say I was bookmarking this post: I could tag it with any words I wanted - tech, library, cataloging, and so on. Those words describe what this item is about in ways that are primarily relevant to me. If they also happen to make sense for someone else, fine.

Then I would also have to choose one or more verbs, words that describe what I want to do with this item. Do I want to read it, save it, comment on it, disagree with it, build something with it, etc.

Business Reporting

Wednesday, March 01, 2006

Folksonomies v. taxonomy

folksonomies + controlled vocabularies

No comments:

Notes to myself . . .

Blog Archive

Links

Google