Google and the Knowledge Graph

I came across a really interesting article the other day on Mashable (Google Knowledge Graph Could Change Search Forever).  Google SVP Amit Singhal lays out their efforts around a more semantic understanding of the web leveraging their purchase of Freebase a couple of years ago.  The gist is that by leveraging a proprietary Knowledge Graph, Google will be able to return search results based on the meaning of documents rather than simply the presence of particular text strings.  It’s a really compelling vision and well worth reading.  Personally, I’m terribly excited about the prospect of not only a truly semantic search, but the proliferation of data systems that are backed by large scale ontologies.  The power of ontology based semantics is a basic tenet of everything we do at Gravity, and it always feels good to see folks like Google moving in the same direction.  For those of you not thoroughly enmeshed in this sort of tech (which is just about everyone), a bit of explanation is probably in order.

What is an ontology?

The simplest way to imagine an ontology is as a graph that shows how things are connected to each other (if you’re already familiar with the nuances of graph theory, RDF, and convergence algos, feel free to skip ahead).  Take the example below from our ontology:

This is a small subset of the many things Kobe Bryant is actually connected to.  A ontology allows you to not only crawl a page and recognize that “Kobe Bryant” is contained in the text and an entity of note, but now you can imbue that article with additional meaning.  Kobe’s presence in a document may be indicative of a web page being conceptually about famous people, basketball, the Lakers, or celebrities who cheat.  We can now move past simply understanding of what’s on a web page and grasp more concretely what it’s about.

Now that was a single entity in the ontology.  Google’s ontology and our own have millions of entities and abstract concepts all interconnected with hundreds of millions of edges.  Topics run the gambit from every person of note throughout history to every song ever recorded to diseases of every flavor.  I can’t speak for Google’s system, but we maintain various weights on those interconnections (Kobe is more tightly bound to “Los Angeles Lakers Players” than “American expatriates in Italy”).  In this way we are able to more easily infer document aboutness.

What’s the point?

Per Mr. Singhal, Google is applying this semantic understanding of content to search.  Would you like results about Kobe as a basketball player, or would you rather see pertinent celebrity gossip?  The ontology allows Google and the user to make that distinction when applied applied to the set of content that includes Kobe as a component.  You can also introduce any number of semantically proximate suggestions to searchers.  Searchers for “surfing” could easily be presented with the opportunity to explore relevant results for the more abstract “water sports” or the more specific “longboards”.  With an ontology we can place topics in their proper context within the set of everything else that exists.

We leverage similar technology to a very different end.  By understanding what every article is actually about, we can consider what pages you engage with to build a holistic picture of those topics and concepts that actually matter to you (your Interest Graph).  That then can be used to present you with content, ads, and other people that you’ll probably enjoy (see a lot more about that here).

For those of you that are just discovering ontologies, I hope this was a helpful introduction.  If you’re in the space, we always love talking shop.  Drop us a line.

.

So you’d like to personalize the web…

Gravity was born under interesting circumstances.  Amit, Jim, and I had joined MySpace early on, and by the end of 2008 were running the business, tech, and product initiatives respectively (at a time when that was a good thing to run).  The three of us had been operating as a team for years and always knew that we’d start a company together at some point.  The real question was what to build.

Social was the obvious choice given our backgrounds (we’d gotten on the social train at a time when you had folks willing to violently argue with you that no one would ever put their picture online).  But by the end of 2008 it was pretty clear that social was on a fairly well established trajectory, and, to a certain degree, a solved problem.  Sure, the particulars were still in flux and market dominance was very much in dispute, but the web of “us” was no longer uncharted territory.  The foundational behavioral frameworks were all in place.  So if the problem of “us” had been tackled, what other ridiculously audacious project could we tackle?

I’m not sure exactly which of us suggested it, but the idea of personalizing the whole web for every user came up and seemed appropriately audacious.  We founded Gravity , and here we are bringing that dream to fruition with our first implementations.  I’m reminded of those early MySpace days I spent explaining to every major web company that social networking was going to change everything and getting only blank stares in return.  So let me say something along those lines about personalization.  Personalization is not a feature; it is an infrastructure.  The power of the social web isn’t widgets or share buttons, it is the ability to see the world through the lens of your friends.  The power of the personalized web is not about recommendations, it is the ability to see the web through a lens that is as utterly unique as you are.

All of that being said, it turns out that personalizing the web is pretty tricky, and not simply from a technical execution perspective.  Rather, one of the biggest hurdles of the endeavor is pinning down exactly what is entailed in “personalization.”  What qualifies one particular match candidate (piece of content, potential friend, ad, etc.) as a better personalization result than another?  Having spent a healthy chunk of time thinking about exactly that problem, we have some thoughts to share.

The Gravity Approach

After a lot of meditation and a number of failed attempts, we’ve settled on what we believe to be the right way to go about personalization.  Our method relies on a number of signals to value an object’s inherent worth and then combines that with a holistic picture of a user to render a set of personalized results that should yield optimize for user happiness.  That’s a mouthful, so we’ll break it down.

The Interest Graph

The foundational component of our system is the Interest Graph.  This is a digital representation of the things you care about and the relative levels of attachment to those things.  I, for instance, am very attached to surfing, start ups, and parenting.  I’m only moderately interested in poodles, iPhone apps, and 3D printing.  Not to be confused with simple behavioral targeting that puts me in binary interest buckets, the Interest Graph has attachment gradients, and a memory that allows for calculation of trajectories and trends.  Interests wax and wane (looking at you, LA Gear fans), and properly projecting patterns at the individual or aggregate levels can be very useful.

Building the Interest Graph can be done in a few ways.  It can be explicitly volunteered by a user (What are you interested in?).  It can be implicitly derived (What are you reading on my site?).  It can be inferred from the things you say (Connect your Facebook/Twitter and let’s have a look at what you’ve been liking, status’ing, and tweeting).  Really, any signal of user interest can be employed to increase or decrease a user’s attachment to any topic under the sun.  And if you handle your ontologies correctly, you can infer attachment to the larger related concepts (Love the Lakers?  Here’s what else is hot in the NBA…).

This approach seems simple enough, but, of course, you have to be able to derive the essential meaning of the things with which a user interacts in order to be able to imbue a user an attachment to the appropriate interests.  This is the hard core semantic science of what we do, and well beyond this simple product guy.  I’ll leave that to the tech gang to explain more competently in another post.

Learn More about Gravity’s Technology here.

Let’s review.  To calculate the Interest Graph for any human:

  1. Understand the objects they create or interact with
  2. Divine the meaning of those objects
  3. Modify their attachment to those meanings based on the type of behavior over time

Great, now we have Interest Graphs.  Hurray!  Hold your horses, little buckaroo.  Having an Interest Graph is like having a map, tells you where to go, but you still have to get there.  Cue the section on personalization.

Personalization

Discovery, executed correctly, is a beautiful thing.  The books you didn’t intend to buy, the people you didn’t set out to meet, these are the serendipitous discoveries that add color to our lives.  This is the ultimate goal of personalization, to show you the things you’ll love that you didn’t know you should be looking for (all needles, no haystacks).  To accomplish this goal, you have to consider a pretty broad set of signals.  Together, they produce a composite score indicative of correctness for a particular user.  Here’s what we consider:

Interest Graph Proximity

Remember our process for calculating a human’s Interest Graph?  We do a similar process for every content object.  Comparing every user to every object, we can confidently say that a particular object is closely relevant to this person’s interests.  The results are actually very good and exceedingly relevant.  The problem with deploying a solution using solely this approach is the lack of serendipity.  It’s predictable and, to a certain degree, boring.  Read a lot about Apple?  Here’s more Apple.  Mostly reading about iPhones from that set?  Now it’s mostly iPhone.  The process tends to winnow results to an unacceptable level of specificity over time.  It’s almost like having a set of saved searches that slowly morph based on their own self-referential activity.  This was one of our early learnings leveraging the Interest Graph, and one we took to heart.  Truly excellent personalization must be something more than this.  Enter content value as a tunable serendipity measure.

Content Value

If you can effectively determine the inherent value of a content object, this value can be combined with Interest Graph proximity.  Together they give you a set of content that is relevant to your interests and serendipitously important.  The set of things that you both want to see and ought to see.  Not particularly interested in tsunamis?  Doesn’t mean that you won’t be when they happen.  So how do you determine the value of an object?  A few vectors are considered:

  • Editorial weight – There are people out there paid to know what is important.  Call them tastemakers, pundits, or editors, their opinions matter.  Recognizing and weighting their guidance can strongly indicate an object’s importance.
  • Virality – Every share, tweet, digg, link, and search is an indication of collective interest in an object or its associated semantic topic.  Where once there was only the linking behavior of webmasters, the universe of user generated content has enabled each of us to indicate what links matter within the superset.  Monitoring the public streams and meta data provides pure signal of the things that matter right now.  The Twitter firehose, among others, is a great mechanism for teasing the gold from the stream if you know how to properly parse the vastness that these data sets represent.  We combine all of these signals into our virality calculations.
  • Interaction feedback – What happens when an object is presented in a personalization context?  Even when properly targeted based on the combined graph proximity and content value, some content just falls flat while others unexpectedly surge.  Constant tuning based on the interaction of users with the targeted content optimizes the results for everyone.

See what gravity personalization looks like here.

The future

So where does all this take us?  We imagine a web where every experience is personal, viewed through the lens of my own interests with a healthy dollop of serendipity on top.  Where not only the presentation of content is informed by my interest graph, but the production of content is informed by our collective interests.  Editors are not replaced, but rather they operate with a level of transparency and sophistication previously unheard of.  Where each of us are able to exorcise the noise from our view and focus only on the gems scattered across the web.  It won’t be easy and it won’t be fast, but that’s the future as we see it.

The Future Will Be Personalized

Note: This blog post originally appeared on TechCrunch here as a guest post by Amit.

When my partners and I joined MySpace, we were lucky enough to be at the leading edge of the social revolution that changed how we use the Internet. A new groundswell is coming, transforming the web once again: the personal revolution.

Information Overload

Today, we live in a world where we’re constantly overwhelmed by information. There are over 90M tweets per day, 34 hours of YouTube video uploaded every minute, and every Facebook user has an average of 130 friends who are becoming more and more active all the time. We also experience this with content farms flooding search results and with the thousands of articles available everyday on traditional websites like the New York Times and ESPN: of which only a handful appeal to each of our individual interests.

The rampant proliferation of information isn’t a new phenomenon. The signal-to-noise ratio on the web has fluctuated substantially as new technology to organize information has battled with new technology to create and distribute information.

Your Web

Their Web: The Early Days of The Internet

In the early days, content was created and organized by professionals. At first, it was contained in networks like AOL, one of the pioneers of the Internet. As the Internet opened up, Yahoo! brilliantly organized the open web with Yahoo! Directory. But eventually the volume of the information overloaded even the directory, and search companies like Google introduced a better way to find content we were interested in. By understanding how sites linked to each other, Google applied new science to find a solution within the problem itself. It worked so well, every website is search engine optimized for this framework.

Our Web: Present Day

In 2003, user-generated content hit the mainstream via sites like MySpace and YouTube, and the volume of information being created increased dramatically.

“Every two days, we create as much information as we did up to 2003.” –Eric Schmidt, CEO of Google

Search engines weren’t designed to effectively organize this social and real-time data. So innovative companies like Facebook and Twitter created a social filter by empowering our friends and people we trust to organize information for us. This new filter has given us access to more and better information than we ever thought possible. Like search, it’s so effective, every website is socially optimized for this framework.

Many of you reading this are avid users of social technology. Like me, you’re probably beginning to experience information overload in your social streams. There’s great content there, but it’s getting increasingly difficult to find it. In engineering terms, the signal-to-noise ratio is dropping (or, as a corollary, the work-to-reward ratio is increasing). And, as more people become more active in the social and real-time web, the problem will only get worse.

Your Web: The Future

Imagine opening up any web page or application and being presented with an experience that’s entirely personalized to you. Go to ESPN.com and see stories about the sports you love and teams you follow featured on the top. Check your daily Groupon for deals that map to your interests. Receive updates from Foursquare about restaurants you’ll want to visit. This is where things are headed. It’s about shifting from you trying to find the right information to the right information finding you.

In the past, we lacked the data and the technology to make this type of personal experience a reality. But that’s changing quickly. The abundant social data that’s overwhelming our social streams not only presents a problem but the solution. Using natural language processing and semantic analysis to evaluate your tweets, status updates, like, shares, and check-ins, it’s possible to build a holistic understanding of who you are and what you’re interested in.

Once the web knows your interests, it can start to change… Any website or app can use knowledge of your interests in order to give you a personal experience.

Music followed a similar evolutionary path. Music discovery has grown from being curated by professionals (DJ’s, MTV) to being introduced socially (mixed tapes, playlists) to being organized around your personal interests (Pandora).

All of this doesn’t mean that editors go away or your friends’ referrals don’t matter. Rather, it’s a new lens focused entirely on you.

Building the Personal Web: Enter Gravity’s Interest Graph

Incredible academic and commercial research in the fields of natural language processing and semantic technology has built the groundwork for where we are today. Still we have a long way to go before the personal web is a reality. Gravity will be one of many companies working on the personal web in the coming years. Our platform will allow partners to personalize their experiences when a user connects to the service. The basis for our platform is what we call the Interest Graph, an online representation of your interests, including your strength of attachment and its trajectory over time.

Gravity @ Web 2.0 Summit

Earlier this afternoon I had a chance to preview some of the exciting stuff we’ve been working on at Gravity on stage at the Web 2.0 Summit in San Francisco. For those of you that couldn’t attend, you can catch the video on YouTube here.

Here’s a recap of what I talked about on stage:

Information overload. The internet is overloaded with information, and everyday it gets more unwieldy: 90 million tweets per day, 35 hours of video uploaded per minute, 1.6 million blog posts per day. With so much information created on a daily basis, it’s hard to find what you’re looking for and to know what you’ve missed.

The Interest Graph. Gravity’s answer is the Interest Graph: an online representation of your real world interests and a new lens through which to view the internet. Your interest graph is your own personal electromagnet. It pulls the best stuff to you based on your interests and leaves all the noise at a safe distance where it can’t distract you. We build your interest graph by analyzing social data (like tweets, retweets, status updates, likes and shares) to create a holistic view of who you are and what you’re interested in.

Twinterest. To see your interest graph today, you can play Twinterest. Twinterest is a Twitter-based game that analyzes your tweets to figure out what you’re interested in and shows how your interests compare to your friends’. It’s the first game built on our platform. You can read more about Twinterest here.

The Orbit. I also previewed The Orbit – a newsfeed built by your interest graph. It automagically finds the best content on the web for the topics you care about.

Our Platform. Lastly, and most importantly, I talked about the platform we’re building. Gravity’s mission is to help the right information find you. We’re building a platform that we’ll let any website to tap into the Interest Graph so that it can deliver a personal experience to you.

I’ll follow up with a more detailed post soon about projects at Gravity and how Gravity uses social data to deliver personal experiences. Be sure to follow us on Twitter to stay in the loop.

Hello World!

Welcome to Gravity! Thanks for your interest in our project.

At Gravity, our mission is to help the right information find you. For the last year and half, we’ve been developing the Interest Graph – an online representation of your real world interests based on what you do and say on the social web. We think the Interest Graph is the key to organizing information around people and unlocking the personalized web.

When we started out, we had no data to build the Interest Graph. So, like any scrappy startup would have, we endeavored to create the data we needed from scratch. We built an avant-garde conversation service that empowered people to discuss their passions, which would eventually become a source of the data we needed.

Then the world changed…

Twitter launched the Firehose, Facebook unveiled the Open Graph and several other social companies realized the value and importance of letting users share their data (thank you!). This sea change created a new opportunity. Overnight, a huge dataset of people talking about the many things they’re interested in became accessible…and we had been quietly developing the technology to turn that data into knowledge.

Now, at last, we have the Interest Graph! It’s big, it’s beautiful, it’s interesting, and there are a lot of incredible experiences we’re going to help users unlock with it.

Over the next few months, we’ll be releasing the keys to our platform and several apps built on top of it. With our platform, you’ll be able to turn on personalized experiences on your existing websites and applications. Gravity will also be releasing a slue of useful and fun applications that help you discover Your Web. We don’t want to get into the details because we know you like surprises (right?). But, our goal with the new Gravity is to validate Arthur C. Clarke’s (author of 2001: A Space Odyssey) law of prediction: “any sufficiently advanced technology is indistinguishable from magic.”

Stay tuned and thanks for your support!