The Unity Effect for Non-speech Stimuli: A Top-down or Bottom-up Process?

TitleThe Unity Effect for Non-speech Stimuli: A Top-down or Bottom-up Process?
Publication TypeConference Papers
Year of Publication2014
AuthorsAngelaki, S, Vatakis, A
Conference NameProcedia - Social and Behavioral Sciences
KeywordsUnity effect

Abstract A voice emanates from a human face and an impact sound from a hammer. These are examples of everyday multisensory experiences, but how exactly do we perceive unified multisensory events? Close temporal and spatial proximity of sensory inputs enhance the probability of those inputs belonging to a single event. Given, however, the multiple inputs that belong to different events may be presented in close spatio-temporal coherence raises the issue of what leads to the correct binding of those inputs. The "unity effect" supports that events that "go together" are the ones that eventually get integrated. In these cases, our perceptual system is more likely to treat the "related" sensory inputs as referring to the same multisensory event rather than separate unimodal events (Vatakis & Spence, 2007, 2008). A number of studies have investigated this effect or their findings can be interpreted according to the unity assumption. For example, Laurienti et al. (2004) presented a series of congruent (visual: red or blue, auditory: "red" or "blue", respectively) and incongruent conditions where an irrelevant stimulus was presented (e.g., visual: red, auditory: "yellow"). In the former case, redundant audiovisual information were presented, whereas, in the latter case conflicting information. Speeded detection of targets (red or blue) was required. The results showed better and faster target detection in the congruent cases as compared to the incongruent. Furthermore, Vatakis and Spence (2007) evaluated the influence of the "unity effect" on the multisensory integration of audiovisual speech stimuli using an orthogonal task (no response was required regarding the matching/mismatching of the stimuli). The speech stimuli (auditory and visual) were either gender matched/mismatched or utterance matched/mismatched. They found participant performance in a temporal order judgment (TOJ) task to be better for mismatched as compared to matched cases. The poor performance in the case of matched pairs lies in the fact that integration is taking place, thus it is harder for the participants to judge the order of presentation. Vatakis and Spence (2008) demonstrated the unity effect for speech stimuli but no such effect was found for non-speech dynamic stimuli (e.g., smashing ice with a hammer) or animal calls (including humans imitating animal calls). Parise and Spence (2009), however, did demonstrate the effect utilizing simple stimuli related to crossmodal correspondences. It is as yet unclear why no unity was obtained for non-speech stimuli. One could argue that this might be due to the "special" nature of speech in terms of its temporal coherence. Thus, the missing temporal coherence in the non-speech stimuli presented in the previous studies could lead to failure of showing unity for other stimuli other than speech. The purpose of this study, therefore, is to investigate whether unity can be obtained for non-speech stimuli that are ecologically-valid but not dynamic and whether the unity effect is driven by bottom-up or top-down processes. The visual stimuli were composed of static images of a cell phone, a flashlight, and a lighter in their proper form and scrambled. The auditory stimuli were composed of a cell phone ringing, a flashlight button, and a lighter. Scrambled images were used in order to examine whether familiarity or low-level factors lead to the unity effect. The visual stimuli were presented in the off state and subsequently their own state was presented along with the matching or mismatching sound. White noise was presented throughout the experiment. Two tasks were completed, an implicit TOJ, as in the Vatakis and Spence studies, and an explicit reaction time (RT) task, as in the Laurienti et al. studies. During the \{TOJ\} task, the stimuli were presented in a matched or mismatched format in 8 different stimulus onset asynchronies (±250, ±130, ±95, ±75, 0 msec). During the \{RT\} task, the participants had to detect whether they heard, saw, or heard and saw a cell phone or a flashlight with the lighter being the irrelevant stimulus (all other combinations will also be tested). We expect, through the \{TOJ\} and \{RT\} task, to demonstrate the unity effect for non-speech stimuli given the control of the temporal coherence of the stimuli. Additionally, through the \{TOJ\} task and the use of scrambled images, we aim to investigate, for the first time, whether the unity effect is driven by top-down or bottom-up processes.

Citation KeyAngelaki2014156