Steven Backer
sbacker
at ccrma.stanford.edu
EE 362 / PSYCH 221 Project
Stanford
University
Winter 2006
Introduction
Motivation
Summary of Previous Experiment
Methods
Experiment Paradigms
Test Conditions
Software Implementation
Results
Analysis
Subject #1
Subject #2
Mean
Comparison with Original Experiment
Conclusions
References
The purpose of this project was two-fold: First, it was meant to serve as an exercise in implementing a perceptual experiment. As I am an engineer and not a psychology student, I have very little (read: zero) experience in conducting this sort of psychological experiment. However, I am quite interested in perception (especially auditory) and this was a good opportunity to gain some practical experience conducting an experiment involving perception. Second, confirming the results from a previous experiment involving multi-modal sensory interactions would further strengthen my belief (and hopefully others) that what we see and hear is not always as it seems. As the human mind is immensely complex, affirming that sensory interactions can and do occur in our formulation of what constitutes an 'event' or an 'object' may or may not be important in the larger picture. Looking at sensory perception from this macro-level is nothing new - there exists a large literature on these topics. I chose to focus on a single paper that addresses the topic of 'Auditory Influences on Visual Temporal Rate Perception'.
Since this is a course project, and not a full-on research project, the results derived below do not constitute anything 'new', per se. Rather, they are a study of what is already known, and nothing more.
A fundamental question in experimental psychology is: Under what conditions do we perceive events as being simultaneous, or even as the same event, given that information about the temporal occurrence of such events is detected through multiple senses? Furthermore, when such multi-modal information contains discrepancies or is contradictory, how do we determine what information to believe as true? At some level, the brain will integrate information from multiple senses (e.g. sight and hearing) and may or may not determine that the sensory information originated from the same object or event. The consequences of such judgments can be far reaching, and it should be obvious why, in today's computer-driven almost-virtual reality, it would be advantageous to understand exactly how these judgments are made.
One prevalent theory is that of 'modality appropriateness'. That is, when the brain receives conflicting information from multiple senses about an event, it gives more weight to the information from the sense that has greater acuity in the given environment and situation. For spatial tasks (i.e. identifying where an object is in space), the visual system usually acquires the most accurate information. For temporal task (i.e. identifying when an event occurs), the auditory system is most often superior in its ability to detect timing cues. However, this hierarchy is not completely true under all circumstances. It may be that the brain has separate streams for what and where information, in which the what and where information from each of the senses integrates into a a super-sensory version of what and where.
This project focuses almost entirely on an article by Gregg H. Recanzone published in the Journal of Neurophysiology, 2003. The original article 'Auditory Influences on Visual Temporal Rate Perception' can be found here. Below is an attempt to summarize the key points from his work, especially those that are relevant to this project.
Rencanzone is mainly concerned here with how decisions about temporal perception are made based on sensory input from eyes and ears. Specifically, what, if any, effect the auditory system has on the visual systems ability to detect differences in temporal rates.
The inverse of the this question has been researched much more heavily, which has given rise to the identification of a phenomenon know as the 'ventriloquism effect'. The name refers to a situation where we see information originating from one spatial location, yet we hear information originating from another location. Nonetheless, due to the superior spatial acuity of the visual system, we are made to believe that the sound comes from where we want to see it as coming from. An interesting aspect of this effect is that if one wears prism goggles for an extended period of time, a perceptual shift in spatial perception occurs and will persist for some time even after the goggles have been removed. Once this shift occurs, it can be proved that auditory localization of sounds uniformly reflects this shift in a direction induced by the goggles. This is often called an 'auditory aftereffect' and represents a 'change in the neural representation of space and time.' Thus it can been shown that the visual system can have profound influence on auditory spatial perception.
Rencanzone attempts to illustrate a similar aftereffect exists in the time domain. By attempting to recreate a similar shift in temporal perception, and comparing visual rate perception before and after such a shift, one might similarly show that the auditory system has significant influence on visual rate perception. He does exactly this, and analysis of visual rate perception in the presence of auditory-induced aftereffects is the main objective in the experiments I have implemented for this project.
In summary, Rencanzone found that the auditory system does have
profound impact on the visual system. The degree of influence is
dependent almost solely upon the perceived differences in temporal
rates between the two senses. He found that the effect is greater
for higher frequencies, which is consistent with the notions that
visual rate processing begins to deteriorate as rate increases while
auditory processing remains quite reliable at these rates. By
observing a predictable shift in responses to a bimodal task
pre-aftereffect, and post-aftereffect, it is possible to demonstrate
the existence of auditory influence on visual perception.
There are two main 'paradigms' that make up this experiment. They are the same 'Paradigm 2' and 'Paradigm 3' referenced in Recanzone's paper, and are summarized below. Essentially, Paradigm 3 is the overall procedure, though note that it includes Paradigm 2 as a subset. Paradigm 2 is where all of the important results and data come from. The entire experiment takes approximately 30 minutes.
> A single sequence of both auditory and visual stimuli are presented to the subject as four consecutive beeps and flashes.
> The subject chooses whether or not the stimuli are temporally aligned (same/different)
> Sequences consist of one of 15 different types:
> Same: both stimuli aligned at 3.0Hz, 4.0Hz, or 5.4Hz
> Different: visual stimuli presented at 4.0Hz; auditory stimuli at 3.0-5.4 Hz in 0.2 Hz steps
> Sequences are aligned at the center
>Recanzone proves that the onset differences have minimal effect on results
> A total of 90 sequences (8 of each type) are presented in a randomly interleaved fashion
> Recanzone used 200 total; reduced here in interest of time and willingness of subjects to participate
> The first 10% of the results are discarded as 'practice trials'
Below is an illustration of what a 'different' sequence might be represented as (taken directly from Recanzone's paper). One stream represents the auditory stimulus, and one the visual stimulus. The stream with the faster temporal rate is shown on the bottom.

> Three steps:
> 1st: perform Paradigm #2
> 2nd: perform a training sequence
> 3rd: immediately perform Paradigm #2 again
> Training sequence:
> Seven sequences of four beeps and light flashes (as above)
> 1s separation between sequences
> Light shown at lower luminance (approx. 25%) after 1-7 sequences
> Subjects indicate during which of the seven sequences the light first becomes dimmer (before the 4th flash)
> Results are not important and are discarded
> Feedback via click sound on correct trials
> Purpose is to keep subject alert and attentive to the visual stimulus
> Total of 45 trials over approximately 15 minutes
> Recanzone uses 60 trials over 20 minutes
> Auditory rate is always 0.6 Hz faster than the visual temporal rate
> aud:3.6-5.6Hz, vis:3.0-5.0Hz in 0.2Hz steps
> Key Point: Serves to induce a perceptual shift in rate perception over the training period, which endures during the repetition of paradigm 2
> Recanzone also uses types where the audio is slower, or the same as the visual rate
> These cases show less influence of audio on visual rate perception, and were not implemented here
Essentially, all test conditions were mimicked from the original to the best degree possible to eliminate any differences between the experiment setups. Where there are differences, they are noted below. It should be noted that Recanzone found, through variations in his experiments, that variations in spatial location, intensity, and spectral content had no significant impact on the temporal rate perception. (This is not true for spatial perception, where both spatial and temporal information are important in reproducing the ventriloquism effect). That is, the conditions necessary to reproduce the desired aftereffects are independent of location, intensity, and timbre, at least to a moderate extent.
> Test tones
> 1 kHz sinusoidal
> Recanzone also tries white noise, and determines spectral content does not influence results
> Moderate volume level
> Recanzone determines intensity does not significantly affect results
> 5ms rise / fall time
> Auditory display: Sony MDR-7506 headphones studio standard
> Recanzone uses 9cm speaker 1.5m from interaural axis center
> Flash
> Location in approximate center of gaze, 1m distance
> Fixation point slightly above center of gaze
> 3 degrees visual angle red dot on LCD, instead of LED placed on speaker
> User control
> User initiated trials triggered after 1.5s delay
> Mouse device used for both trigger and data entry
> Recanzone uses a switch
instead
The experiment was implemented using the software package Pd. 'Pd (aka Pure Data) is a real-time graphical programming environment for audio, video, and graphical processing.' [4] There is a commercial version of the software, Max/MSP, that is virtually identical to Pd, having stemmed from the same person - Miller Puckette. The difference being that Pd is open source, and is thus widely available for all to use. Thus, the 'patches' I have created for this project can easily be downloaded, used, and modified by anyone wishing to recreate this same experiment or any variant thereof. This is especially helpful if someone is interested in changing one or two single parameters and seeing how the results are affected. Pd runs on Windows, Mac, or Linux (I used Linux for this project).
Using this software was particularly advantageous because I was able to automate data collection, and this bypass any tedious data entry steps. See the patches below for more details.
To run this experiment, you will need to install some additional libraries, or 'externs':
Gem - Graphics Environment for Multimedia
Zexy - needed for generating random numbers without replacement
IEMLib - various utilities
Here is the actual patch for implementing 'Paradigm 2' as described above:
Here are some screenshots that should give you an idea of how it all fits together:
The patch:
Screenshots:
Although Pd is capable of creating the tones (one of its most basic functions), all of the test tones were synthesized beforehand to minimize the load on the CPU during the experiment. This further ensured that the timing would be as accurate as possible. The tones were created in MATLAB, and then simply loaded into Pd's memory before the experiment began.
The individual tones consist of pure 1kHz sinusoids of varying durations. The duration is determined by the temporal frequency at this the tone will be repeated. The temporal frequencies range from 3.0Hz to 5.6Hz in increments of 0.2Hz.
MATLAB script to synthesize tones:
Archive containing all of the tone wav files (includes click.wav for feedback in paradigm 3):
The data collected during the experiment was analyzed in MATLAB. The analysis is fairly straightforward, and inspection of the MATLAB source code below will make clear exactly how all results were derived. Data was collected for only two subjects; Recanzone's original experiment only used four subjects. As time was limited both for myself and the subjects, it was difficult to find more participants to take the 30+ minute experiment. Nonetheless, it is assumed that data from only two subjects is fairly representative.
Aside: I did try taking the experiment myself, and the results looked quite random! I'm not sure what this implies, other than the obvious!
MATLAB files for data analysis:
Data files can be found here:
Subject #1: Pre-training [1
2] Post-training [1
2]
Subject #2:
Pre-training [1 2]
Post-training [1 2]
The first test subject was a male, age 35.
First, we look at the percentage of responses where the subject indicated the two stimuli were 'different', or not temporally aligned. This is paradigm 2 from above. Responses for both the pre- and post- training session are shown on a single graph. Clearly, a perceptual shift has occurred in the higher frequency range.
Figure 1: '% Different' for Pre/Post Training
In the second figure, the difference between the post and pre responses is plotted versus frequency of the auditory stimulus. It is seen that the difference peaks at 50% for this subject - a very large shift at 4.8Hz.
Figure 2: Post - Pre % Difference
The second test subject was a female, age 24
Curiously, this subject showed more of a shift for the lower frequencies, and the second plot suggest that the data here is not quite as consistent as the first subjects. This could be due to one of many factors two numerous to even speculate upon. However, it is reiterated that the number of samples used is this experiment was roughly half of the number Rencanzone used, and the training period was reduced from 20 minutes to 15.
Figure 3: '% Different' for Pre/Post Training
Figure 4: Post - Pre % Difference
Results for the two subjects were averaged, and summarized in the plots below. The averaged data for the two subjects indeed reflects a very uniform and predictable shift of the curve to the right. Looking at the average difference function in Figure 6, it is seen that the difference peaks at just under 30% for both high and low frequencies. This finding is consistent with Rencanzone's results.
Figure 5: '% Different' for Pre/Post Training
Figure 6: Post - Pre % Difference
Below are the similar images from Recanzone's experiment (one-time, educational use permitted). It can be seen that while his images are a bit more pronounced and consistent (likely due in part to a larger number of samples, more subjects, and 5 minutes longer exposure to the training sequence), the results of this project's experiment agree quite well with those of the original.
Figure 7: '% Different' for Pre/Post Training for Single Subject, and Mean
Figure 8: Post - Pre % Difference for Four Subjects, and Mean
It has been shown through experiment that visual temporal rate perception is indeed influenced by auditory sensory information, although the extent to which this occurs remains unclear. It is certain though, that under these circumstances, auditory-induced aftereffects can be observed after exposure to approximately 15-20 minutes of a constant bimodal disparity where the auditory rate exceeds the visual rate by 0.6Hz. These findings not only confirm the methods and results of Rencanzone's experiment, but supports the hypothesis that we are more apt to believe information from the sense with greater acuity in a given situation - in this case, the auditory system's ability to detect when.
As far as my experience conducting this experiment goes, I feel like I took a great deal more out of the project than just the subject matter. I gained a deeper appreciation for the attention to detail, precision, and consistency with which these types of experiments must be conducted. If I had to do it over again, I would try to use more samples, longer tests, and more subjects (of course, I would probably have to find participants that weren't busy grad students!). Hopefully I've contributed something as well; Anyone who has read this far and is interested in using the experimental setup I've provided here, please feel free to email me for assistance. The results of the project just go to show that understanding the human mind is a complex (grey) matter, and when attempting to do so, we cannot always believe what we see - or hear,taste,smell or feel for that matter!
[1] Fujisaki, W., Nishida, S. (2005) Temporal frequency characteristics of synchrony-asynchrony discrimination of audio-visual signals. Exp Brain Res 166:455-464
[2] Recanzone GH (2003) Auditory Influences on Visual Temporal Rate Perception. J. Neurophysiology 89:1078-1093
[3] Wada, Y., Kitagawa, N., Noguchi, K. (2003) Audio-visual integration in temporal perception. Intl. J. Psychophysiology 50:117-124
Last Updated 3/19/2006 6:09PM