Friday, 12 November 2010

Dramatic study shows participants are affected by psychological phenomena from the future

Perhaps there's something in the drinking water at Cornell University. A new study involving hundreds of Cornell undergrads has provided a dramatic demonstration of numerous 'retroactive' psi effects - that is, phenomena that are inexplicable according to current scientific knowledge (pdf).

Rather than having the students read each others' minds or wear sliced ping-pong balls over their eyes, Daryl Bem has taken the unusual, yet elegantly simple, approach of testing a raft of classic psychological phenomena, backwards.

Take priming - the effect whereby a subliminal (i.e. too fast for conscious detection) presentation of a word or concept speeds subsequent reaction times for recognition of a related stimulus. Bem turned this around by having participants categorise pictures as negative or positive and then presenting them subliminally with a negative or positive word. That is, the primes came afterwards. Students were quicker, by an average of 16.5ms, to categorise negative pictures as negative when they were followed by a negative subliminal word (e.g. 'threatening'), almost as if that word were acting as a prime working backwards in time.

If psi abilities have really evolved, it makes sense that they should confer survival advantages by helping us find mates and avoid danger. In another experiment Bem had dozens of undergrads guess which set of curtains in a pair on a computer screen was concealing an erotic picture. Participants were accurate on 53.1 per cent of trials, compared with the 50 per cent accuracy you'd expect if they were simply guessing. This accuracy was increased to 57 per cent among students who scored higher on a measure of thrill-seeking. By contrast, no such psi effects were observed for neutral stimuli.

In another experiment participants looked at successive pairs of neutral mirror images and chose their favourite - the left or right. After each pair, an unpleasant picture was flashed subliminally on one side or the other. You guessed it, participants tended to favour the mirror image on the side of the screen opposite to where an unpleasant picture was about to appear.

The examples keep coming. The mere exposure effect is when subliminal presentation of a particular object, word or symbol causes us to favour that target afterwards. Bem turned this backwards so that participants chose between pairs of negative pictures, and then just one of them was flashed subliminally several times. Female participants tended to favour the negative images that went on to be flashed subliminally, as if the mere exposure effect were working backwards through time.

This backward mere exposure effect didn't work for male undergrads, perhaps because the images weren't arousing enough, so Bem replicated the experiment using more extreme negative images and erotic images. This time a 'backwards' mere exposure effect was found with men for unpleasant images. For positive imagery, mere exposure traditionally has a negative effect, as the stimuli are made to become more boring. Bem showed this effect could also happen from the future. Presented with pairs of erotic images, male undergrads showed less favour for the images that went on to be flashed subliminally multiple times. It's as if the participants knew which images were going to become boring before they had.

Finally, we all know that practice improves learning. Bem tested students' memory for word lists and then had them engage in extensive practice (e.g. typing out) for some of the words but not others. His finding? That memory performance was superior for words that the students went on to practice afterwards - a kind of reverse learning effect whereby your memory is improved now based on study you do later.

These reverse effects seem bizarre but they are backed up by some rigorous methodology. For example, Bem used two types of randomisation for the stimuli - one that's based on computer algorithms, which produce a kind of pseudo-randomisation in the sense that a given distribution of stimuli is decided in advance. And another form of randomisation based on hardware that produces true randomisation that unfolds over time as an experiment plays out. Also throughout his paper, Bem uses multiple forms of simple statistical test and he reports results for each, thus demonstrating that he hasn't simply cherry picked the approach that produces the right result. Across all nine experiments the mean effect size for the psi effects was 0.22 - this is small, but noteworthy given the nature of the results.

So what's going on? Bem doesn't proffer too many answers although he argues that his psi phenomena vary with subject variables, just like mainstream psychological effects do. For example, the phenomena were nearly always exaggerated in the more extravert, thrill-seeking participants. From a physics perspective, he believes the explanations may lie in quantum effects. 'Those who follow contemporary developments in modern physics ... will be aware that several features of quantum phenomena are themselves incompatible with our everyday conception of physical reality,' Bem argues. 'Many psi researchers see sufficiently compelling parallels between these phenomena and characteristics of psi to warrant considering them as potential candidates for theories of psi.'

ResearchBlogging.orgDaryl Bem (2010). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology. In Press PDF.

Post written by Christian Jarrett (@psych_writer) for the BPS Research Digest.

UPDATE, plucked from the comments:
Failures to replicate this study: 1, 2, 3
A successful replication.
A criticism of the stats methods used.
A flaw in the methodology.
A registry of replication attempts.


trumpeting said...

53% isn't that far off from 50%, especially for a sample size of 1000. Would you be surprised if a coin toss turned up 53% heads after 1000 trials?

I find this all very dubious. The quick and dirty reference to quantum physics at the end really kills it.

I appreciate that the BPS blog doesn't discriminate on the types of studies it publishes but there is a risk of discrediting other fields of psychology if a report like this shows up on the same playing field.

Unknown said...

Hi Trumpeting, thanks for your comments. I should point out that this research is published in a mainstream, respected journal of the American Psychological Association. The stats appear sound. The discussion of quantum physics in the paper is far more in-depth than the brief mention in my report.

Rift said...

So what's your opinion? You report on it, but are you convinced? Did you suspect Psi existed beforehand? Is evidence extraordinary enough to support the nature of his claim?

Unknown said...

Hi Rift, My opinion is that it's a great study. Rigorously conducted. Eloquently written (provides great intro to the field). But this isn't proof of psi, far from it. Needs to be replicated. I like how Bem has used standard psychological tasks as a way to explore psi. Makes it easier for other labs to try to replicate.

Anonymous said...


I don't think it's enough to dismiss it because it's only a 3% difference. We deal with small effects all the time in psychology, and it appears this study has the statistical power to back it up.

It's just a matter of seeing if the results replicate across studies.

I do agree the part that doesn't involve data (i.e., the evolutionary explanation and quantum physics) sounds kind of like post-hoc rationalizing to me.

John Taylor said...

I downloaded this yesterday and have started reading it. It does appear that the study was rigorously conducted. We will have to see if it can be replicated and generalized. I am hoping not, really, since I prefer a universe in which cause precedes effect.

Neuroskeptic said...

It sounds like a remarkable set of studies, but, I'll wait on the independent replication.

One of the tasks, the retroactive practice effect for word stimuli, would be especially easy to replicate since it doesn't require any programming - beyond a good random-number generator - just pen & paper.

Andrew said...

A fairly detailed analysis of the statistical problems is here.

At least one failure to replicate has already surfaced.

Plus Bem is renowned for one very telling quote:
“There are two possible articles you can write: (1) the article you planned to write when you designed your study or (2) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (2).” (Bem, 2003, pp. 171-172)

Kurt said...

Grad student in social psych here. My impression was the same as Christian's: I do not accept psi and think more evidence is needed, but Bem does a heckuva job writing a convincing article. He takes well-known psychological effects and tweaks them in simple but compelling ways. Bem knows how to write in a way that speaks to psychologists, that is for sure.

Andrew, I know exactly what you mean about Bem's writing philosophy. I think it makes for more interesting papers but I would also like to see a replication.

One philosophical problem is that he has not demonstrated a plausible mechanism for his effects. The paper's conclusions rest on him ruling out many alternative explanations, supposedly leaving "psi" as the only remaining explanation, but he has not provided positive evidence for why his effects are occurring.

The first two studies were weaker, but I found the memory and mere exposure studies especially plausible.

Unknown said...

@ trumpeting:
53% is huge within a sample of 1.000. and, like 88% in a sample of 1.000, it can be chance. the chances are pretty small, though.

the quantum physics excuse is not a very good one for these kind of data.

and without going too much into the details of the paper itself: assumed the data are correct (which i doubt, for a start), it's probably some kind of instructional bias effect.


Unknown said...

Hi Andrew
The failed replication that you cite was conducted via an online survey (according to New Scientist), so hardly a sound test. The paper on the stats problems sounds interesting - thanks for flagging that up. I note however that it appears to be a criticism on the use of stats in psychology in general, not just by Bem. In other words, he used stats that would normally be considered sound.

Andrew said...

Which would be fine, except for the fact that psychologists have been using p values badly for a long time. I got taught GLM stats in grad school (because you have to have ANOVAs to publish) by a professor who's spent a lot of time recently advocating for Bayesian methods in analysis; he knew GLM was problematic but we have to know how to do what the journals expect (and reviewers do freak out at non-standard analyses).

The study I mentioned did happen online, which is obviously a difference. But there's plenty of research showing that, with care, running studies online can be fine, and Bem's answer is simply that you lose total control over the study this way.

This review mentions two more failures to replicate here and here.

Kurt, re mechanism: that review also mentions a quote from Bem saying it's "absurd" that he should be expected to come up with a theory to explain his data. He's so entirely incorrect about that it's not funny.

Kurt said...

re: failed replications.

I fail to replicate well-established effects all the time. What I would LOVE to see is somebody who identifies a moderating factor that can show why Bem's studies worked and properly controlled studies don't work. Something like, "The effects only occur when participants are told the study is about psi and when the RA believes in psi. When participants are ignorant and the RA is skeptical the studies don't work." And they would actually randomly assign people to informed/believer or ignorant/skeptical conditions.

re: mechanism.

I think there can be a place for initial publications (even in JPSP) where a novel phenomenon is presented, without necessarily showing what is happening. For example, in his early priming studies Bargh didn't have a mechanism. (In fact, they kind of argue against any psychological mechanism, that there is a direct prime-to-behaviour-link.) If the evidence for the effect is relatively solid, it should be published in order to provoke more work in that area.

Here though, the effect is ridiculously implausible a priori. So I agree that a mechanism would help us to confirm the effect because we would know WHY it's happening, instead of leaving us with the suspicion that it's just some sort of unidentified, uncontrolled third variable.

Anonymous said...

From personal practice of lot of weird things (shamanisn, mediumnity, I-Ching, etc...) I can tell you there are a lot of "anomalous" experiences going on, however they are of course difficult to tell apart from self-delusion, thus it is a good thing that well designed studies begin to investigate these "anomalies".
It would be very damageable if these studies were to be abandonned under the pressure of "conformity" and it is quite commendable to Bem to dare dabble in such perilous topics.
I am not surprised by the weakness of the effects (53-57%) the psi effects are notoriously unreliable but they seem undeniable in the long run when you experience them directly and not thru hearsay.

As for the mechanism(s) involved, THIS is the real challenge, because, if you know about sensitivity to initial conditions you quickly realize that prediction is impossible (the famous "weather problem"), so, if some prediction, even imperfect (3% boost), is actually occurring this means that some information escapes the mangling of the chaotic attractors.
An interesting physics problem isn't it?
So good luck with the "why" question...

Neuroskeptic said...

An internet-based failure to replicate is still a failure to replicate if the methods are solid.

Also, note that we already have an enormous literature on responses to stimuli, and they come after the stimuli, not before. Every EEG study of event-related potentials or even more simply, every study of the acoustic startle response, is (implicitly) a test for psi. I have done the acoustic startle on some 100 British people and none of them were psychic.

kat said...

The Neuroskeptic makes a very interesting point in that we already have a lot of data from stimulus-response experiments and nobody's noticed these sorts of effects before. I would imagine it would be fairly easy to look at some random sets of experimental data and scrutinize them for retroactive effects.

It would be interesting to see if these sorts of effects turn up in other species-- do chimpanzees, capuchins, pigeons, rats, etc. demonstrate retroactive stimulus response? If there's no evidence for such in non-human animals I would become a lot more suspicious (and I'm already suspicious!) of any supposed "psi" explanation of the Bem experiment/any replications thereof.

All in all this is very strange and like everyone else I'm waiting for a robust and consistent effect to be demonstrated through replication.

Unknown said...

Hi kat - in bem's intro he claims similar effects have been found in two animal studies

Anonymous said...

The Neuroskeptic makes a very interesting point in that we already have a lot of data from stimulus-response experiments and nobody's noticed these sorts of effects before.

No, this is NOT an "interesting point".
Why would studies designed to test responses coming after the stimuli show any kind of retroactive effects while such effects are so weak and require very stringent methodology to be reliably detected?
This is just ordinary prejudice at work, probably boosted by the same kind of anguish as shown above by John Taylor: the sky is falling, the sky is falling :-)

Coert Visser said...

Here is a critical analysis of bem's work:

lorenzo said...

We're all scientists here, right? So, either the results are a correct explanation of the phenomena or not.

If they are, then either
-- the study is biased (there is something surfacing); or
-- we have psi powers

If they are not, i.e. the results do NOT describe a causality link, then the methodology has a flaw.

The latter is a very bad situation: if the methodology is "standard" and "accepted", then this study proves that the statistical framework/experimental setup/method is flawed and it would render void all studies conducted with the same method.

To use logic:

A implies B

the implication is true if and only if A is true AND B is true (this is the case where we all have psi powers).

If A is true and B is false, then the study is biased.

But it could be also that A is false; in this case it does not matter what B yields to, because the implication is false too (this is the latter case).

I am not saying we definitly have psi powers or not; I am just saying that a hypothesis is held to be valid until disproved. And this requires more testing and more replication before it becomes a 'theory'.

kat said...

"Why would studies designed to test responses coming after the stimuli show any kind of retroactive effects while such effects are so weak and require very stringent methodology to be reliably detected?
This is just ordinary prejudice at work, probably boosted by the same kind of anguish as shown above by John Taylor: the sky is falling, the sky is falling :-)"

My statement did not come from any "prejudice" against psi. What I meant is that in a experiment, let's say, where you have repeated presentation of a stimulus, you could easily look at the raw data for any anomalous anticipation of the next presentation. Many experiments are set up with repeated trials and you could look at the results from the first trial, for example, to see if the subject's responses show retroactive effects from what is presented in the second trial.

Yes, these effects are small, but Bem's point is that they are detectable using standard statistical tools and a relatively simple experimental set-up (Bem's methods aren't unusually "stringent" except for his concerns about clairvoyance vs. premonition and the randomness of his random number generators). Given the absolutely huge number of these sorts of studies that have been conducted I would have thought that somebody would have seen this sort of effect before somewhere. It is possible that no one's bothered to look; this is why I think it would be interesting to actually analyze some subset of the existing data (experiments that were conducted in a way such that they would plausibly show influence from the future) to see if these retroactive effects pop up in more mundane situations.

I actually think this is important because if they are detectable in regular stimulus-response situations, it presents an extremely baffling extra variable to account for. How are we supposed to control for effects-- that we don't even understand yet!-- from the future? How does this alter our understanding of previous experiments that weren't controlled for future influence? If it is actually the case that there is such a thing as retroactive influence (regardless of its cause) this changes the way we have to design experiments and analyze data. This means lengthening buffering times, being extra careful about order of presentation and repetition of stimuli, etc.

I don't believe in psi, but that is because until I saw this article I had no evidence (personal or scientific) for it. I still don't believe in psi, but I am definitely interested in seeing where this line of research goes. I could yet change my mind.

Anonymous said...

If it is actually the case that there is such a thing as retroactive influence (regardless of its cause) this changes the way we have to design experiments and analyze data.

That will be difficult to swallow for many (the sky is falling..) but this is just an extra variable to care about, don't we already have something akin to that with double blind experiments:
Why is it that the experimenter has to be in the dark?
Is he so dishonest (consciously or unconsciously) that he could not refrain from cheating?
This cannot possibly be a placebo effect or can it be?
That is, in some cases, he could actually alter the results by some psi effects.
I am thinking here of Jacques Benveniste whom I personally met, who was absolutely honest up to the point of screwing his own career (not "confessing" nor recanting) and whose water memory theory was totally bogus while his experiments did work.

Anonymous said...


I dunno if statistician Cosma Shalizi already read about this study but he definitely doesn't like it :-D

OTOH Cosma Shalizi is under the "spell" of a very heavy political/philosophical bias: he even denies the existence of intelligence!
Which for someone like him is pretty baffling.

Marcus Beasley said...

I think it's funny the researcher claims to have found evidence for precognition yet can't tell what hypothesis is being tested before he's seen the results.

Jon Reston said...

Summary of my views here:

The confusion between exploratory and confirmatory analyses is problematic, and the fact that the statistical errors made are common in psychology does nothing to exonerate this case.

Generally well reported paper though, plenty of detail so that we can draw our own conclusions and attempt meaningful replications. Nice to see this sort of thing published, even if it serves as nothing more than a dos and don'ts example for research methods courses.

We have to see how attempted replications (using confirmatory analyses) turn out before we draw any concrete conclusions, of course...

Anonymous said...

Undergrads talk.

Anonymous said...

Anonymous said...
Undergrads talk.

Yup, this is the "problem" only undergrads dare challenge the consensus.
(May be not what you thought of?)

Anonymous said...

Sorry was meaning that it is impossible to control the extent to which study participants (particularly undergraduates...) talk to each other about the experiments they have participated in.

Anyone who has run behavioural studies that involve concealing the purpose of the experiment until the debrief will know all about this problem...

For this study I can't immediately see what specific pieces of information they may have (wittingly or unwittingly) passed on to other participants. However it's another factor that is likely to be having at least some influence (and this may be why the replication conducted over the web may have failed to find the same results?)

Anonymous said...

If precognition exists, it would be exploited for financial gain the second anyone discovered it. This has not occurred, therefore it does not exist.

Please don't tell me it has been used for financial gain, because a secret like that ain't staying secret -- especially if Bem's results show precognition is fairly common.

Anonymous said...

If precognition exists, it would be exploited for financial gain the second anyone discovered it. This has not occurred, therefore it does not exist.

Black and white thinking, it would be so if precognition were reliable and it is not according to "common" statistical tests, which may be conceptually ill designed (not catching all significant clues).
I personally know someone who got banned from gambling because he was "too successful" playing Baccarat making a living of it, and he is not able describe any method he used.

Anonymous said...

The problem with Bem's research is not the quality of the protocols, or even the sample size alone. He has done a great service in this respect. It's suffers from the fact that the effects were derived after the fact, as in exploratory. The more effects being tested the larger the sample size has to be to demonstrate said effect. In open ended after the fact derived effects, it means very little.

Certainly the excellent protocols should be repeated, with predefined effects to test. Any other effects, besides what is preselected for testing doesn't count. Do a separate test for them. No multiple test per dataset. If the effects remain, then skeptics such as myself will have to put our thinking caps back on. At present, no such evidence exist at present. I look forward to such real test.

Anonymous said...

Any other effects, besides what is preselected for testing doesn't count.

This is the point, you blind yourself to anything you are not already expecting.
Since no one has any idea about a possible mechanism (see comments above), how will you design the protocol, how will you choose what to test?

If you acted like this in everyday life you would be "removed from the gene pool" pretty quickly.

trumpeting said...

@Christian: I appreciate your direct feedback and apologize for being a knee-jerk skeptic. A friend pointed out the extremely low probability of the null hypothesis which gave me a better grasp of how the results are intriguing.

Looking forward to the studies that attempt to reproduce the effect.

Anonymous said...

Any other effects, besides what is preselected for testing doesn't count.

This is the point, you blind yourself to anything you are not already expecting.
Since no one has any idea of the possible mechanism (see above comments...), how will you design the protocol? How will choose what to test?

If you acted like this in everyday life you would be "removed from the gene pool" pretty quickly.

Anonymous said...

I would look into the software used to present the stimuli. I have programmed many reaction time experiments and can start to tell which stimulus will be next (even when randomized) after running it 50 or so times.

I believe this is from variable load times for pictures and the amount of memory needed to run different experimental conditions. A subtle difference will be produced when code needs to execute just a picture vs. a picture followed by the word "threatening"

The fact that "thill seekers" are more prone to the effect suggests some kind of attention effect, meaning these people more sensitive to barely perceptible differences in onset times.

I think this is the most plausible explanation.

Anonymous said...

Also, the RT difference of 16.5ms is the equivalent of 1 refresh rate of a CRT monitor set at 60Hz. (16.5 * 60 = 990.00)

This is a common error difference I see when programming RT experiments.

dowder said...

As an undergrad I "volunteered" to see how long I could hold my arm in a bath of ice. Then I had to sit in the waiting room before trying a second time. In the waiting room, the next subject asked me what the study was about. I sang like a canary. The "next subject" was, of course, in on the study. And that was the experiment, not the ice bath. Undergrads talk, is right... I'm just sayin'...

Anonymous said...

I think this is the most plausible explanation.

I do believe too that this is one plausible explanation among many, even horses can do that.
But how many covert channels are there to carry information from the future to the present?

As I said above in one of the early comments the truly interesting problem is: does some information survives the sensitivity to initial conditions which prevents any long term prediction?
Or, more modestly, for which time span is (approximate) prediction possible via means we don't yet know of?

Anonymous said...

One successful replication of the effect can be found here, for what it's worth:

Retrocausal Habituation and Induction of Boredom. A Successful Replication of Bem (2010; Studies 5 and 7), by Alexander BatthyƔny at SSRN (

Unknown said...

Thanks Anonymous at 1:14pm, I've added a link to the foot of the blog post.

Anonymous said...

Train an animal to get a treat if it chooses, say, a circle and not a square, and then make it predict where the circle will be next.

Anonymous said...

If undergrads talk, then why not train an animal to get a treat if chooses, say, a circle and not a square, and then make it predict where the circle will be next. Surely animals would've had the same benefits of psi powers as humans.

psi_des said...

Just a few notes.

It's clear that Bem doesn't claim the mechanism to be related to quantum entanglement effects. The use of quantum entanglement in his arguments are merely to show that something just as incredible already happens so maybe we shouldn't consider this to be too impossible.

Also, a failure of a couple of replications is to be expected but I do believe that withing the next 10 years this paper would have changed the way we look at the world. This sort of research has been going on for quite a while but usually the experimental design doesn't survive the vicious attacks of skeptics, let alone debunkers. What Bem has done here is truly amazing and extremely brave.

Anonymous said...

On the bright side, now you can study AFTER exams :) thank you experiment #8

Anonymous said...

The 53% isn't statistically significant. The study used 100 people (experiment 1), and you would have needed 60% to be on the 1-sigma boundary. Even for a thousand people, you would need 53.3% for it to be on the 1-sigma boundary, regardless of the 3-sigma boundary that most fields of science require observations to be on. The only way 53% would have been significant would be if 10,000 people were used in the experiment.

Same criticism for the second, third, fourth, fifth, sixth, seventh, eighth.. oh, and ninth. Wait, that's all the experiments conducted. None of them are statistically significant? How odd!

Oh, and they discarded data from the third experiment. You NEVER do that in science, especially in a psychology experiment where perception is key. They discarded data from people who viewed the wrong picture as pleasant or unpleasant when the person was then told to pick a picture based on their perceptions. Why? I can only assume that the data was discarded because it did not fit the pattern.

All in all, results and analysis weren't great, sample size in each experiment doesn't merit statistically significant observations, and the results don't merit the "dramatic" conclusions he's drawn. On the other hand, methodology seems good.

Unknown said...

I have to admit that I haven't looked at the report in detail, but this sounds a lot like some 'scientific' article from the HearthMath institute that was pretty easy to debunk: My simulation also ended up at about 54 %, which is pretty close to the number mentioned here.

Post a Comment

Note: only a member of this blog may post a comment.