I came across a really interesting article the other day on Mashable (Google Knowledge Graph Could Change Search Forever). Google SVP Amit Singhal lays out their efforts around a more semantic understanding of the web leveraging their purchase of Freebase a couple of years ago. The gist is that by leveraging a proprietary Knowledge Graph, Google will be able to return search results based on the meaning of documents rather than simply the presence of particular text strings. It’s a really compelling vision and well worth reading. Personally, I’m terribly excited about the prospect of not only a truly semantic search, but the proliferation of data systems that are backed by large scale ontologies. The power of ontology based semantics is a basic tenet of everything we do at Gravity, and it always feels good to see folks like Google moving in the same direction. For those of you not thoroughly enmeshed in this sort of tech (which is just about everyone), a bit of explanation is probably in order.
What is an ontology?
The simplest way to imagine an ontology is as a graph that shows how things are connected to each other (if you’re already familiar with the nuances of graph theory, RDF, and convergence algos, feel free to skip ahead). Take the example below from our ontology:
This is a small subset of the many things Kobe Bryant is actually connected to. A ontology allows you to not only crawl a page and recognize that “Kobe Bryant” is contained in the text and an entity of note, but now you can imbue that article with additional meaning. Kobe’s presence in a document may be indicative of a web page being conceptually about famous people, basketball, the Lakers, or celebrities who cheat. We can now move past simply understanding of what’s on a web page and grasp more concretely what it’s about.
Now that was a single entity in the ontology. Google’s ontology and our own have millions of entities and abstract concepts all interconnected with hundreds of millions of edges. Topics run the gambit from every person of note throughout history to every song ever recorded to diseases of every flavor. I can’t speak for Google’s system, but we maintain various weights on those interconnections (Kobe is more tightly bound to “Los Angeles Lakers Players” than “American expatriates in Italy”). In this way we are able to more easily infer document aboutness.
What’s the point?
Per Mr. Singhal, Google is applying this semantic understanding of content to search. Would you like results about Kobe as a basketball player, or would you rather see pertinent celebrity gossip? The ontology allows Google and the user to make that distinction when applied applied to the set of content that includes Kobe as a component. You can also introduce any number of semantically proximate suggestions to searchers. Searchers for “surfing” could easily be presented with the opportunity to explore relevant results for the more abstract “water sports” or the more specific “longboards”. With an ontology we can place topics in their proper context within the set of everything else that exists.
We leverage similar technology to a very different end. By understanding what every article is actually about, we can consider what pages you engage with to build a holistic picture of those topics and concepts that actually matter to you (your Interest Graph). That then can be used to present you with content, ads, and other people that you’ll probably enjoy (see a lot more about that here).
For those of you that are just discovering ontologies, I hope this was a helpful introduction. If you’re in the space, we always love talking shop. Drop us a line.
I am pleased to announce the launch of Gravity labs, an initial peek into our underlying interest graph infrastructure as well as a showcase of some of our Open Source projects.
For the last 2+ years we have been working on productionizing a web-scale system that leverages a wide range of unique disciplines, from natural language processing to large scale semantics and ontology development, to real time behavioral algorithms, all the way to a variety of different machine learning techniques.
We didn’t set out to build a complex system that encompasses so many different disciplines, we set out to personalize the web. Unfortunately, it didn’t take long for us to realize that the generally accepted collaborative filtering and behavioral targeting algorithms available today didn’t meet many of our core requirements. There are quite a few – but the primary requirements for a cloud based, web scale personalization engine are:
Real time capable:
- New user events occur in the tens, hundreds and even millions of times per second. A user’s personalized experience needs to update in real time as each event, or group of events occurs
- New content is created across the web at similar rates. Content needs to be available for recommendation to a user immediately upon its creation.
- Signal generated by a user on one site needs to be applicable to user’s recommendations on all other sites across the web. Just like a user is now able to interact with their social graph as they use the broader web, they need to be able to take their interest graph (or “personalization profile”) with them to every website they visit, and be able to both apply it and augment it anywhere.
- We are all unique. You can’t put everyone in a bucket. While neighborhood/bucketing based algorithms do work (and are one, albeit small, component of our infrastructure), generalizations about people’s actions are made in order to enable scalability at the cost of accuracy. A true personalization engine should absolutely minimize grouping users together as much as possible, and treat each individual as a unique entity with a unique set of interests
- The fears of the filter bubble are real, and existing personalization and contextual recommendation engines often drive users down a more and more narrow content discovery path. A successful personalization engine needs to have the capability to inject serendipity into a users experience at an individual level. Both the general, real time consensus of content that is important across the global web regardless of a user’s interest, and the semantic relationships between are very different, but highly connected interests needs to be taken into account.
It has been (and will continue to be) quite a challenge. It has required the minds of very different people with many different core skillsets.
And it took a long time. Candidly, one reason we have been so quiet about our development efforts is because we wanted to make sure we could get far enough ahead of everyone else . We’ve popped in and out of the news with test/data acquisition products here and there, but the goal has always been a system that can accurately process all of the interest based signal data across the entire web, and leverage it to personalize every user’s internet experience.
We are proud to announce that the above system, or the “Gravity Interest Service” as we call it internally, officially went live at production scale 6 months ago.
Since then we have:
* Created over 400 million user interest graphs
* Served over 13 Million pieces of personalized content per day
* Personalized the daily internet experience of tens of millions of users per month
* Processed over 25 million inbound interest signals per day
And with our current growth rate we will be handling 10X all of these numbers in under 6 months.
It’s an exciting time for us here, so we have decided to give a (small) peek under the hood, as well as open source some of our non-core components. We leverage a significant amount of open source software for a good portion of our data storage and processing, and want to contribute what we can back to the community.
Thanks for your interest in Gravity. There is a lot more coming in the very near future, but our new Beta Labs Section should give you enough to play with until then.
Gravity was born under interesting circumstances. Amit, Jim, and I had joined MySpace early on, and by the end of 2008 were running the business, tech, and product initiatives respectively (at a time when that was a good thing to run). The three of us had been operating as a team for years and always knew that we’d start a company together at some point. The real question was what to build.
Social was the obvious choice given our backgrounds (we’d gotten on the social train at a time when you had folks willing to violently argue with you that no one would ever put their picture online). But by the end of 2008 it was pretty clear that social was on a fairly well established trajectory, and, to a certain degree, a solved problem. Sure, the particulars were still in flux and market dominance was very much in dispute, but the web of “us” was no longer uncharted territory. The foundational behavioral frameworks were all in place. So if the problem of “us” had been tackled, what other ridiculously audacious project could we tackle?
I’m not sure exactly which of us suggested it, but the idea of personalizing the whole web for every user came up and seemed appropriately audacious. We founded Gravity , and here we are bringing that dream to fruition with our first implementations. I’m reminded of those early MySpace days I spent explaining to every major web company that social networking was going to change everything and getting only blank stares in return. So let me say something along those lines about personalization. Personalization is not a feature; it is an infrastructure. The power of the social web isn’t widgets or share buttons, it is the ability to see the world through the lens of your friends. The power of the personalized web is not about recommendations, it is the ability to see the web through a lens that is as utterly unique as you are.
All of that being said, it turns out that personalizing the web is pretty tricky, and not simply from a technical execution perspective. Rather, one of the biggest hurdles of the endeavor is pinning down exactly what is entailed in “personalization.” What qualifies one particular match candidate (piece of content, potential friend, ad, etc.) as a better personalization result than another? Having spent a healthy chunk of time thinking about exactly that problem, we have some thoughts to share.
The Gravity Approach
After a lot of meditation and a number of failed attempts, we’ve settled on what we believe to be the right way to go about personalization. Our method relies on a number of signals to value an object’s inherent worth and then combines that with a holistic picture of a user to render a set of personalized results that should yield optimize for user happiness. That’s a mouthful, so we’ll break it down.
The Interest Graph
The foundational component of our system is the Interest Graph. This is a digital representation of the things you care about and the relative levels of attachment to those things. I, for instance, am very attached to surfing, start ups, and parenting. I’m only moderately interested in poodles, iPhone apps, and 3D printing. Not to be confused with simple behavioral targeting that puts me in binary interest buckets, the Interest Graph has attachment gradients, and a memory that allows for calculation of trajectories and trends. Interests wax and wane (looking at you, LA Gear fans), and properly projecting patterns at the individual or aggregate levels can be very useful.
Building the Interest Graph can be done in a few ways. It can be explicitly volunteered by a user (What are you interested in?). It can be implicitly derived (What are you reading on my site?). It can be inferred from the things you say (Connect your Facebook/Twitter and let’s have a look at what you’ve been liking, status’ing, and tweeting). Really, any signal of user interest can be employed to increase or decrease a user’s attachment to any topic under the sun. And if you handle your ontologies correctly, you can infer attachment to the larger related concepts (Love the Lakers? Here’s what else is hot in the NBA…).
This approach seems simple enough, but, of course, you have to be able to derive the essential meaning of the things with which a user interacts in order to be able to imbue a user an attachment to the appropriate interests. This is the hard core semantic science of what we do, and well beyond this simple product guy. I’ll leave that to the tech gang to explain more competently in another post.
Learn More about Gravity’s Technology here.
Let’s review. To calculate the Interest Graph for any human:
- Understand the objects they create or interact with
- Divine the meaning of those objects
- Modify their attachment to those meanings based on the type of behavior over time
Great, now we have Interest Graphs. Hurray! Hold your horses, little buckaroo. Having an Interest Graph is like having a map, tells you where to go, but you still have to get there. Cue the section on personalization.
Discovery, executed correctly, is a beautiful thing. The books you didn’t intend to buy, the people you didn’t set out to meet, these are the serendipitous discoveries that add color to our lives. This is the ultimate goal of personalization, to show you the things you’ll love that you didn’t know you should be looking for (all needles, no haystacks). To accomplish this goal, you have to consider a pretty broad set of signals. Together, they produce a composite score indicative of correctness for a particular user. Here’s what we consider:
Interest Graph Proximity
Remember our process for calculating a human’s Interest Graph? We do a similar process for every content object. Comparing every user to every object, we can confidently say that a particular object is closely relevant to this person’s interests. The results are actually very good and exceedingly relevant. The problem with deploying a solution using solely this approach is the lack of serendipity. It’s predictable and, to a certain degree, boring. Read a lot about Apple? Here’s more Apple. Mostly reading about iPhones from that set? Now it’s mostly iPhone. The process tends to winnow results to an unacceptable level of specificity over time. It’s almost like having a set of saved searches that slowly morph based on their own self-referential activity. This was one of our early learnings leveraging the Interest Graph, and one we took to heart. Truly excellent personalization must be something more than this. Enter content value as a tunable serendipity measure.
If you can effectively determine the inherent value of a content object, this value can be combined with Interest Graph proximity. Together they give you a set of content that is relevant to your interests and serendipitously important. The set of things that you both want to see and ought to see. Not particularly interested in tsunamis? Doesn’t mean that you won’t be when they happen. So how do you determine the value of an object? A few vectors are considered:
- Editorial weight – There are people out there paid to know what is important. Call them tastemakers, pundits, or editors, their opinions matter. Recognizing and weighting their guidance can strongly indicate an object’s importance.
- Virality - Every share, tweet, digg, link, and search is an indication of collective interest in an object or its associated semantic topic. Where once there was only the linking behavior of webmasters, the universe of user generated content has enabled each of us to indicate what links matter within the superset. Monitoring the public streams and meta data provides pure signal of the things that matter right now. The Twitter firehose, among others, is a great mechanism for teasing the gold from the stream if you know how to properly parse the vastness that these data sets represent. We combine all of these signals into our virality calculations.
- Interaction feedback – What happens when an object is presented in a personalization context? Even when properly targeted based on the combined graph proximity and content value, some content just falls flat while others unexpectedly surge. Constant tuning based on the interaction of users with the targeted content optimizes the results for everyone.
See what gravity personalization looks like here.
So where does all this take us? We imagine a web where every experience is personal, viewed through the lens of my own interests with a healthy dollop of serendipity on top. Where not only the presentation of content is informed by my interest graph, but the production of content is informed by our collective interests. Editors are not replaced, but rather they operate with a level of transparency and sophistication previously unheard of. Where each of us are able to exorcise the noise from our view and focus only on the gems scattered across the web. It won’t be easy and it won’t be fast, but that’s the future as we see it.
Note: This blog post originally appeared on TechCrunch here as a guest post by Amit.
When my partners and I joined MySpace, we were lucky enough to be at the leading edge of the social revolution that changed how we use the Internet. A new groundswell is coming, transforming the web once again: the personal revolution.
Today, we live in a world where we’re constantly overwhelmed by information. There are over 90M tweets per day, 34 hours of YouTube video uploaded every minute, and every Facebook user has an average of 130 friends who are becoming more and more active all the time. We also experience this with content farms flooding search results and with the thousands of articles available everyday on traditional websites like the New York Times and ESPN: of which only a handful appeal to each of our individual interests.
The rampant proliferation of information isn’t a new phenomenon. The signal-to-noise ratio on the web has fluctuated substantially as new technology to organize information has battled with new technology to create and distribute information.
Their Web: The Early Days of The Internet
In the early days, content was created and organized by professionals. At first, it was contained in networks like AOL, one of the pioneers of the Internet. As the Internet opened up, Yahoo! brilliantly organized the open web with Yahoo! Directory. But eventually the volume of the information overloaded even the directory, and search companies like Google introduced a better way to find content we were interested in. By understanding how sites linked to each other, Google applied new science to find a solution within the problem itself. It worked so well, every website is search engine optimized for this framework.
Our Web: Present Day
In 2003, user-generated content hit the mainstream via sites like MySpace and YouTube, and the volume of information being created increased dramatically.
“Every two days, we create as much information as we did up to 2003.” –Eric Schmidt, CEO of Google
Search engines weren’t designed to effectively organize this social and real-time data. So innovative companies like Facebook and Twitter created a social filter by empowering our friends and people we trust to organize information for us. This new filter has given us access to more and better information than we ever thought possible. Like search, it’s so effective, every website is socially optimized for this framework.
Many of you reading this are avid users of social technology. Like me, you’re probably beginning to experience information overload in your social streams. There’s great content there, but it’s getting increasingly difficult to find it. In engineering terms, the signal-to-noise ratio is dropping (or, as a corollary, the work-to-reward ratio is increasing). And, as more people become more active in the social and real-time web, the problem will only get worse.
Your Web: The Future
Imagine opening up any web page or application and being presented with an experience that’s entirely personalized to you. Go to ESPN.com and see stories about the sports you love and teams you follow featured on the top. Check your daily Groupon for deals that map to your interests. Receive updates from Foursquare about restaurants you’ll want to visit. This is where things are headed. It’s about shifting from you trying to find the right information to the right information finding you.
In the past, we lacked the data and the technology to make this type of personal experience a reality. But that’s changing quickly. The abundant social data that’s overwhelming our social streams not only presents a problem but the solution. Using natural language processing and semantic analysis to evaluate your tweets, status updates, like, shares, and check-ins, it’s possible to build a holistic understanding of who you are and what you’re interested in.
Once the web knows your interests, it can start to change… Any website or app can use knowledge of your interests in order to give you a personal experience.
Music followed a similar evolutionary path. Music discovery has grown from being curated by professionals (DJ’s, MTV) to being introduced socially (mixed tapes, playlists) to being organized around your personal interests (Pandora).
All of this doesn’t mean that editors go away or your friends’ referrals don’t matter. Rather, it’s a new lens focused entirely on you.
Building the Personal Web: Enter Gravity’s Interest Graph
Incredible academic and commercial research in the fields of natural language processing and semantic technology has built the groundwork for where we are today. Still we have a long way to go before the personal web is a reality. Gravity will be one of many companies working on the personal web in the coming years. Our platform will allow partners to personalize their experiences when a user connects to the service. The basis for our platform is what we call the Interest Graph, an online representation of your interests, including your strength of attachment and its trajectory over time.
Earlier this afternoon I had a chance to preview some of the exciting stuff we’ve been working on at Gravity on stage at the Web 2.0 Summit in San Francisco. For those of you that couldn’t attend, you can catch the video on YouTube here.
Here’s a recap of what I talked about on stage:
Information overload. The internet is overloaded with information, and everyday it gets more unwieldy: 90 million tweets per day, 35 hours of video uploaded per minute, 1.6 million blog posts per day. With so much information created on a daily basis, it’s hard to find what you’re looking for and to know what you’ve missed.
The Interest Graph. Gravity’s answer is the Interest Graph: an online representation of your real world interests and a new lens through which to view the internet. Your interest graph is your own personal electromagnet. It pulls the best stuff to you based on your interests and leaves all the noise at a safe distance where it can’t distract you. We build your interest graph by analyzing social data (like tweets, retweets, status updates, likes and shares) to create a holistic view of who you are and what you’re interested in.
Twinterest. To see your interest graph today, you can play Twinterest. Twinterest is a Twitter-based game that analyzes your tweets to figure out what you’re interested in and shows how your interests compare to your friends’. It’s the first game built on our platform. You can read more about Twinterest here.
The Orbit. I also previewed The Orbit – a newsfeed built by your interest graph. It automagically finds the best content on the web for the topics you care about.
Our Platform. Lastly, and most importantly, I talked about the platform we’re building. Gravity’s mission is to help the right information find you. We’re building a platform that we’ll let any website to tap into the Interest Graph so that it can deliver a personal experience to you.
I’ll follow up with a more detailed post soon about projects at Gravity and how Gravity uses social data to deliver personal experiences. Be sure to follow us on Twitter to stay in the loop.
Today we launched Twinterest on stage at The Web 2.0 Summit 2010 in San Francisco!
Twinterest is a Twitter-based game that analyzes your tweets to figure out what you’re into and shows how your interests compare to your friends’. You can play Twinterest here: http://www.gravity.com/labs/twinterest.
Here’s how the game works. First, you connect with Twitter. Next, we pull your tweet history and use our natural language processing technology to determine your interests based on what you’ve said on Twitter. After we process your tweets, we create a personal interest report for you. The report shows your interests and how your interests compare to your friends who have already played. You can tweet your results to your friends and followers or @mentions specific friends so that they’ll play and you can compare interests. Give it a whirl and let us know what you think.
Twinterest is the first game built on our platform. We built Twinterest for a few reasons. First, we wanted to show you your Interest Graph, an online representation of your real-world interests, and give you a sense for how we can apply it. In this case, Twinterest shows you what you have in common with your friends and followers on Twitter. Second, we wanted to tune our interest graphing algorithms and our ontology. Whenever a user removes an interest, our interest service gets a little smarter. This crowdsourcing is invaluable for making our platform better. Finally, we wanted to establish a connection with users so that we can show them future products built by their interest graphs.
Enjoy Twinterest, and please bear with us while it scales. We working hard to make sure you have a great experience!
It’s been another awesome (and somewhat spooky) week at Gravity HQ.
Shenanigans have included an epic pumpkin carving session, multiple all night hackathons, a visit from one of our very cerebral advisors, and a guest appearance by Bert and Ernie. The photos tell the story better than I can.