Thoughts on BPR3

Posted on August 16th, 2007

Last week, Dave Munger at Cog Daily started a discussion about creating a standard icon for science bloggers to use when reporting on peer-reviewed research. The discussion snow-balled into an idea, and now a new website, Bloggers for Peer-Reviewed Research Reporting, has been launched.

Still, we don’t have an icon. This morning I emailed Dave Munger (full email copy posted at BPR3) thinking out loud about the icon and how it fits well with the EasyPg plugin. I’ve been letting this brew in mind all day, so here are some more thoughts about the icon and the BPR3 initiative.

  • The icon should be square as that will allow easier resizing: the same icon can be used at various sizes including 16px by 16px to be used as the favicon of the BPR3 website. Also, it will fit better with the current trend for icons, like the square RSS icon. One exception: the 80×16px ‘chicklets’ icons/buttons that are also the current trend like the one below (forgive my lack of design skills). This little example shows what we can do: the left hand side of the chicklet can be the BPR3 icon, and we can have text on the right, like simply saying "BPR3". Example uses:
    • BPR3 image example
    • For differentiating peer-review links, something like:

      …Via CNN: a new paper talks about…

  • The icon needs to be accessible, i.e., colour-blind friendly. I’m no expert in this area, so I’ll leave it to others.
  • Copyright: I think we need an enforceable licence, like a Creative Commons one. I suggest CC Attribution-ShareAlike 3.0 Unported. As long as we have attribution, even if BPR3 fails, the idea will live on.
  • The tag line/slogan: Right now it stands as "Report on Peer Reviewed Research". How about ‘(Yet) Another Peer Review’? We can even go all Web 2.0 on it and call it ‘yapr’ :)

The other idea of BPR3 is aggregation of blog posts discussing peer reviewed research. I have a few thoughts on this too:

  • It should use HTML markup that can be easily embedded in blog posts. I suggest we use Pg’s markup to maintain consistency and avoid creating two ’standards’. Unless, of course, there is something wrong with it, but I can’t see any. We can also Technorati tags and pull the data live out of Technorati using their API.
  • We should create plugins for the most popular blogging platforms. If we use Pg’s markup, my EasyPg plugin would be ready to go. Otherwise, we need to create new plugins to promote usage.

What do you think? Post back here or, better, at BPR3.

Shaking Nanodetector

Posted on August 14th, 2007

In-plane shaking of nanoresonators throws off impurities.

A cool new technology for detection of bacteria, viruses, DNA and other biological molecules has been demonstrated. Resonators (cantilevers) made of narrow strips of silicon a few millionths of a meter long with bound antibodies can be used to detect bacteria suspended in a sample. The bacteria specifically attach to the antibodies and so alter the vibration in a detectable way. The problem is when other impurities in the sample attach non-specifically to the cantilever.

The new [review=http://pubs3.acs.org/acs/journals/doilookup?in_doi=10.1021/nl0621950]paper[/review] shows that non-specifically bound material can be shaken off if the resonator vibrates ‘in plane’, i.e.,side to side. In-plane vibration can be created by hitting the base of the cantilever with a laser beam pulsing at a certain frequency. To measure in-plane motion the researchers shined another laser on the free end of the cantilever and detected the chopping of the beam as the cantilever moved.

So the new technology could work like this: the cantilever is first vibrated up and down to let the sample components attach to it. Then it is made to vibrate sideways to shake of things that didn’t attach specifically. What is left are those components you’re trying to detect (if they are there) and so you can reliable measure their presence.

Imagine in the future going to a doctor who takes a sample and sends it to the lab. Instead of waiting hours or days to find out if there is an infection and if so what’s causing it, the wait could potentially be reduced to minutes. There is a lot work needed to get to that, but we’re on our way! Very neat stuff.

Wouldn’t it be nice if Global Warming Went Away?

Posted on August 12th, 2007

Wouldn’t it be nice if global warming just, you know, just went away? Yes it would, but it will not be through a statistical fudge but through hard work.

You see, a lot of people are beaming with joy that NASA’s US temperature data had a small error in its analysis, so 1998 is now no longer the hottest year on record, but 1934 is. This lead to some seriously flawed claims like global warming is just a statistical farce and those NASA scientists that made the mistake (and fixed it, and thanked the person who reported it) should be fired. In reality, 1998 and 1934 were always very close to each other (statistically indistinguishable) but with 1998 just nosing ahead. When the error was fixed, 1934 nosed ahead, but still statistically indistinguishable. Yippie.

However, the global trend of warming is still there. The past decade is still the warmest on record. CO2 levels are at extraordinary levels. Global warming is still very much here.

Reading about the story is very amusing, so read it as it unfolded in the following order:

[tags]global warming, 1934[/tags]

Summary of Academic Publishers Cloaking Discussion

Posted on August 5th, 2007

So the dust seems to have settled a bit about the issue of academic publishers cloaking their pages to Google. This post is a summary of the facts that emerged and the observations made, a quick recap tying it all together, and a suggestion for the next step.

For reference, the links around the web are:

Facts and Observations

There are three facets to this debate:

  1. Technical side re how this cloaking is implemented
  2. Google’s policies regarding this issue
  3. User perception of this issue

So with that, the following points have been made:

    • These publishers are part of the Google Scholar program. Google initially contacted a few major publishers to join the program.
    • The cloaking is IP-based as a simple switch of the User Agent to Googlebot’s doesn’t work. That’s not surprising to us in the field; I linked to how you can do that (with full code) in my previous post.
    • Definition of cloaking from Google’s guidelines:

      Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.

      So there is no doubt this is an example of cloaking. The question is whether this is acceptable or not.

    • A relevant quote from the Google Scholar Publisher Policies:

      Google users must be offered at least a complete abstract.This is a crucial component of our indexing program. For papers with access restrictions, a full author-written abstract will help users choose among the results which paper is the most likely to have the information they are looking for.

      Some people pointed out that this is not always happening with some SpringerLink articles.

    • A lot of academics are annoyed by this cloaking. The sentiments of the comments on John Baez’s original post speak loudly. The blogosphere has other posts from annoyed academics.
    • I get the feeling that people would be happy to keep the for-pay results in the Scholar search results but keep them away from the main search results. If that happens (and I think it should), the for-pay results need to be labelled clearly. John Mueller wrote a comment on the Sphinn story about how this already happens with Google News.

So what now?

Some people are clearly upset. Some people are upset at expensive publishers in general (and so having them in Google’s results make things worse) and some people are upset that Google is letting some publishers break its terms of service/policies so obviously without any perceived reward for the user.

Fundamentally, I believe the question of what’s acceptable cloaking and what isn’t boils down to user perception and expectation. If users expect to say for-pay content in the search results, they are OK with it, but please label it properly. Pubmed, a major aggregator of bioscience papers, has two icons to depict whether the paper is freely available (via an Open Access license) or only the abstract is freely available. There is no reason why Google shouldn’t do this too.

The key question is what happens when cloaked results appear unexpectedly. Clearly people find this (very) annoying. Of course, Google’s policy has so far been to ignore it as they sort of need it for Google to be able to index the papers for Google Scholar (and thus allow it). Well, Google, consider this set of posts as very vocal customer feedback: Take out for-pay content from the main search engine results pages. We’re OK to keep them in the Scholar results, but label them.

And academics, you can do something about it! There are three things you can do:

  • In the short term, file a spam report with Google. Very inconveniently, there are two ways to do this. You can use the so-called unauthenticated submission form, and that’s publicly accessible. Owners of websites can use the so-called authenticated form using their webmaster central form. More details about spam reporting from the horse’s mouth.

    The spam details are as follows: state that you have found evidence for cloaking in the main search engine results pages (SERPs). Submit the full URL of the results page, state the apparent URL of the result (right click and copy the link location - exact wording varies in each browser), state that the result is labelled as a PDF file, and submit the URL you actually end at. This gives the spam team a full audit trail. If you can submit more than one example, do so. And tell them this is annoying you if it is.

  • Stop using Google! If their search results are not useful to you, use another search engine. MSN has a great Academic Search, and for general searches, try Yahoo!. I recommend Hakia as a decent search engine (it’s still in beta, so the results can be spammy or a bit irrelevant) and there are hundreds of alternatives. Take your pick and vote with your feet!
  • In the long-term, if access is important to you, publish in prestigious journals that have an Open Access policy you agree with. If enough people do that, the Open Access journals will get an increase in their impact factor and the administrators will be happy again. Having a debate about it in the journals themselves is also helpful. This question is about awareness but it can happen with time.

So that’s it for now. I’ve already submitted an authenticated spam report to Google. Let’s hope there is a response!

Newsci Roundup 5

Posted on August 4th, 2007

A selection of reading for your pleasure…

Academic Publishers as Spammers

Posted on August 2nd, 2007

The promise of high search engine rankings, and the ensuing traffic, is making some very large academic publishers use a black-hat spamming technique called ‘cloaking’ to attract visitors to their sites via the search engines. The idea of cloaking is simple: when the search engine indexing crawler requests a page, the website gives the crawler one version of the page. When a human visitor requests the same page, they see a different version. Distinguishing search engine crawlers from human traffic is very easy, and in the case of Google and MSN/Live, it’s 100% fool-proof.

The publishers in question all behave in a similar pattern: the search engine results are for a PDF file, presumably a paper that Google thinks is relevant. When a user clicks the link to the PDF, they are instead presented with information about how to purchase the article. So in this case, the cloaked version of the page is the PDF and the real version is the purchase form.

I’ve been getting more and more annoyed by these publishers, and two days ago John Baez from the Department of Mathematics, University of California, Riverside, complained about this spam. The comments show just how annoyed people are about this. John’s post prompted me to join the naming and shaming of these spammers, and so without further delay…

SpringerLink

They seem to be the best at gaming Google. The papers hosted on springerlink.com rank highly in many fields of knowledge. The simplest way to out them is the search for [site:springerlink.com filetype:pdf], which queries Google for all PDF files hosted on springerlink.com. A screenshot of the results I’m getting is below. Just click any of these purportedly PDF files and see what happens. For the first result in the screen shot, I’m getting this page.

SpringerLink spamming Google

IngentaConnect

Next up is ingentaconnect.com. They don’t have much of a search engine presence, but they still cloak. An example is the search in the screenshot below using the search term [site:www.ingentaconnect.com intitle:"journal"]. The third result I get is a PDF that when I click on leads to this page.

ingentaconnect.com spamming Google

Royal Society of Chemistry

Yep, rsc.org. The screenshot below shows how to find cloaked PDFs by searching for [site:rsc.org filetype:pdf "carbon dioxide"]. The very first result leads to page asking for £22.

rsc.org spamming Google

Taylor & Francis

T&F host a lot (all?) of their journals on informaworld.com. So what does Google say for [site:www.informaworld.com filetype:pdf "carbon dioxide"]? The third result in the screenshot below takes me to a page asking for £18 this time.

T&F spamming Google

Conclusion

Well, the publishers are clearly cloaking their pages for the Google crawler. This contravenes Google’s webmaster guidelines, and lesser websites have been removed from the Google index for similar tactics. What’s amusing is that Google keeps making a big fuss about how cloaking is bad but doesn’t do anything about these big publishers.

There is a question here we have to discuss: is this really cloaking if paying customers eventually end up seeing the same content that the search engine crawler sees? Some think in these instances it is not cloaking, but I beg to differ: What the average user sees should be identical to what the search engine crawlers see. If a page is in the search engine index it means that anyone can have access to it, without the need to register (even if free) or paying. That’s my 2c. Take it or leave it.


Robot Learns to Walk

Posted on August 1st, 2007

Just like a human!

A [review=http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pcbi.0030134]paper[/review] published a few weeks ago by German and Scottish scientists in PLoS Computational Biology describes a robot that walks on two legs (bipedal) and learns to adapt its gait to different types of terrain.

I will not go into the full details because the video below (one of two published by the authors) says it all. Here the robot is walking along a flat surface and suddenly encounters an incline. When humans encounter an incline (like going up a hill), we usually lean a little bit forward (thus bringing our center of balance a bit forward), our gait becomes a bit shorter, and we slow a bit. This allows us to be better balanced for climbing the incline. You can see the same effect with the new robot in the video. The first few times the robot encounters the incline, it falls flat on its back (!) because it is not adapting its gait to the incline. However, it eventually learns what it needs to do and manages to successfully climb the incline every time. Compare its gait, speed, and leaning when the robot is on the flat surface compared to the incline.

The key here is that the robot is learning to do this - the adaptation is not programmed. Thus it’s one model of the human neural networks that govern human walking.

And with this intro…

[tags]robot, bipedal, walking[/tags]