explicit data | Gravity Blog

Let’s play a game. Say the top five things you’re interested in out loud. GO!

If you’re like me, and the vast majority of people I’ve tried this with over the years, your process went something like this:

Draw a complete blank
Repeat “Ummmmm” a few times
Say some vague words like TV or internet.

Welcome to the wonderful world of explicit user data! Explicit feedback comes in light servings and is corrupted by bashful and boastful self-reflection. It’s a problem we dealt with extensively in the social context at MySpace, and the personalization space here at Gravity. After reading about Netflix’ experience with explicit data, I thought it might be helpful to dive into some of the things we’ve learned over the years.

While there is infinite variation in the quality of explicitly collected data, problems tend to fall within four broad categories. Let’s go through them.

Blank box syndrome

People are not great at coming up with data about themselves. I am complex, how do I represent myself in a list? It is not uncommon to watch users in testing completely freeze when presented with a blank “Describe yourself” field. These sorts of open ended questions tend to produce data sets that are quite incomplete (I can only come up with two things off the top of my head) or overly vague (I say “sports” when all I really like is baseball). How you collect explicit data is critical to final quality. Help the user make quality responses.

*Note: Also beware of stale data when dealing with blank boxes. At MySpace we found that you could pretty accurately peg the creation date of a profile by what movies were listed as favorites. People tended to pull whatever was currently in theaters and never update that list again.

Peacocking

Peacocking, the introduction of spurious data as “decoration” to create an idealized public personae, is a byproduct of the socialization of the Internet. As more of our interaction online is visible to our peers, there is a tendency to present the idealized self. This is rampant with data users volunteer that they know will be public like social profiles. I don’t like “The Notebook”, but putting it on my profile sure makes me look sensitive.

Self Censorship

This is the flip side of the coin from peacocking. “Party in the USA” is a great song, but I probably won’t be sharing it (for an amazing example check out the last.fm list of songs most deleted from public scrobbles). This also is primarily a problem when dealing with data that will be available to a user’s peer. Where peacocking introduces false data, self censorship prevents what is often critical information from being made available.

The Aspirational Self

While peacocking is the intentional introduction of spurious data for public view, the aspirational self is trickier. Even in entirely private venues, users will often provide data reflecting the person they want to be rather than who they really are. As the Netflix guys put it to Wired: ” People rate movies like Schindler’s List high, as opposed to one of the silly comedies I watch, like Hot Tub Time Machine. If you give users recommendations that are all four- or five-star videos, that doesn’t mean they’ll actually want to watch that video on a Wednesday night after a long day at work. Viewing behavior is the most important data we have.”

So there you go. Explicit data has to be carefully managed on both the collection and interpretation fronts or it can easily lead to incorrect conclusions and courses of action.

-Steve

Gravity Blog

Personalizing the Internet

Tag Archives: explicit data

Exaggerators, Liars and Why Actions Speak Louder than Words

Blank box syndrome

Peacocking

The Aspirational Self

Blank box syndrome

Peacocking

The Aspirational Self

Share this: