Reuters Wants The World To Be Tagged - ReadWriteWeb

Written by Alex Iskold / February 6, 2008 1:47 AM / 14 Comments

As Richard MacManus recently predicted, in 2008 we'll witness the rise of semantic web services. From the native support for Microformats in Firefox 3, to the New York Times' utilization of rich headers metadata, to this week's release of the Social Graph API by Google, semantics are starting to slip onto the web. The impact is being felt because large companies are really starting to focus on structured information.

In the same vein, last week Reuters - an international business and financial news giant - launched an API called Open Calais.

The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. This technology is the next generation of the Clear Forest offering, which Reuters acquired last year. We have profiled Clear Forest on ReadWriteWeb and in this post we will look at what Reuters opened up and why.
Open Calais API Basics

The idea behind Calais is simple - identify interesting bits into metadata in documents. In this implementation the focus is on People, Companies, Places, and Events, but surely the technology can be adopted to other entities. The heavy lifting is done by the combination of a natural language processing engine and a massive hard coded, learning database that Clear Forest has built.