Main Page Content:
ACAPRSS feed
-

Gavin O’Reilly responds to Google on ACAP

Posted by Martin Stabe on 13 March 2008 at 07:00
Tags: ACAP, Google

The consortium of publishing groups behind the Automated Content Access Protocol (ACAP) has responded to comments made about the project yesterday by Google’s head of media and publishing partnerships for Europe, Rob Jonas.

ACAP seeks to establish a new technical standard for allowing website publishers to specify different levels of access they wish to grant search engines’ indexing software. The group argues that the existing standard, known as the Robots Exclusion Standard (or “robots.txt”) is insufficient.

Jonas told yesterday’s MediaGuardian Changing Media Summit that “the general view within the company is that the robots.txt provides everything most publishers need to do.

Gavin O’Reilly, the chairman of the World Association of Newspapers and COO of Independent News & Media responded in a statement:

It’s strange for Google to be telling publishers what they should think about robots.txt when publishers worldwide across the sector have already very clearly told Google that they disagree. If Google’s reason for not supporting ACAP is that they think publishers should have a different view then we would ask Google to respect the fact that after considerable consideration and work we have identified not only the inadequacies of robots.txt but also come up with a practical and open solution. We call upon Google to adopt ACAP as soon as possible and respect the right of content owners to determine how their content is used.

Last November, Times Online became the first newspaper website to adopt ACAP.

-

Online news, robots.txt, and ACAP

Posted by Martin Stabe on 16 November 2006 at 11:24
Tags: ACAP, Journalism

In the coming year, one of the more technical running debates in online publishing will concern the development of a new standard for automatically informing search engines’ indexing robots about the conditions for accessing online content.

A group of global publishing industry bodies, including the World Association of Newspapers, is proposing a new mechanism for this known as the Automatic Content Access Protocol, or ACAP.

Some time in the next two or three weeks, the consortium will launch a year-long pilot programme to develop the new standard. Six or seven online publishers, from both sides of the Atlantic, along with one — or possibly two — of the three major search engines will be involved, according to Mark Bide of Rightscom, who is coordinating the project.

The project, formally launched at the Frankfurt Book Fair last month, was first revealed in September after the Belgian newspapers sued Google for copyright infringement. ACAP, it was claimed, would help prevent similar disputes in the future.

At the time, Google seemed to insist that existing opt-out mechanisms were sufficient. On Google’s official blog, the company’s European Director of Communications and Public Affairs, Rachel Whetstone, wrote: “[I]f publishers don’t want their websites to appear in search results (most do) the robots.txt standard (something that webmasters understand) enables them to prevent automatically the indexing of their content.”

The “robots.txt” protocol, more formally known as the Robots Exclusion Standard, has been around for more than a decade — ancient by online standards. It works by letting webmasters include a simple text file in the top level directory of their web site, which tells robots about any sections of the site they should ignore. Here’s Guardian Unlimited’s robots.txt file, for example:

User-agent: *
Disallow: /sendarticle/
Disallow: /Users/

This means search robots are being given full access to the site, except for the two named directories.
Times Online, by contrast, has rather more extensive rules in its robots.txt file.

There is no mechanism for enforcing compliance, but reputable search engines follow the rules the set out in robots.txt files.

The problem is that the rules that can be set with robots.txt are fairly limited. It can be used to welcome search engines or to tell them to stay away, but that’s about it. ACAP’s supporters suggest that this binary on-off switch is inadequate. Their new standard, they say would build on robots.txt by allowing publishers to set more detailed terms and conditions of access.

Publishers might, for example, want to allow search engines to index their site, but not make a cached version available to its users — something that becomes important when a libelous article, already removed from the publisher’s web site, remains visible in a search engine’s cache feature.

Timing might also be important for some publishers. For whatever reason, publishers might not want their content to appear on aggregators for the first hour after publication — or they might want to allow archive material remain available in a cache only for the first 10 days following publication. None of this can be achived with the existing standard.

Or publishers might want to insist that a search engine displays their a page synopsis of the publisher’s chosing rather than simply drawing on a few lines of copy from the page.

The ACAP consortium is already concerned about how it is percieved, particularly blogosphere, Bide said at a briefing at the Association of Online Publishers yesterday afternoon. Bide is concerned, specifically, that the project is being wrongly framed as a case of “Europe vs. America” or “publishers vs. Google”. ACAP have set aside budgets for lobbying and PR.

There are likely to be many questions about ACAP in the coming months. Is it really necessary? What are the details of the specifications? How will web developers implement it in existing sites? Will the search engines cooperate?
See also: Rebuilding Media, Steve Yelvington.

-

Advertisement

E-mail Newsletter Signup

-

Advertisement

-

Advertisement