[P2 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents (indiqués par leurs initiales). Les articles de BPC ne peuvent être choisis qu'à l'écrit.


AdC1: Neuroimage. 2018 Feb 1;166:60-70.

Encoding of natural timbre dimensions in human auditory cortex.

Allen EJ, Moerel M, Lage-Castellanos A, De Martino F, Formisano E, Oxenham AJ.

Timbre, or sound quality, is a crucial but poorly understood dimension of auditory perception that is important in describing speech, music, and environmental sounds. The present study investigates the cortical representation of different timbral dimensions. Encoding models have typically incorporated the physical characteristics of sounds as features when attempting to understand their neural representation with functional MRI. Here we test an encoding model that is based on five subjectively derived dimensions of timbre to predict cortical responses to natural orchestral sounds. Results show that this timbre model can outperform other models based on spectral characteristics, and can perform as well as a complex joint spectrotemporal modulation model. In cortical regions at the medial border of Heschl's gyrus, bilaterally, and regions at its posterior adjacency in the right hemisphere, the timbre model outperforms even the complex joint spectrotemporal modulation model. These findings suggest that the responses of cortical neuronal populations in auditory cortex may reflect the encoding of perceptual timbre dimensions.


AdC2: Curr Biol. 2018 Mar 5;28(5):803-809

Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

Broderick MP, Anderson AJ, Di Liberto GM, Crosse MJ, Lalor EC.

People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the N400 component of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity tracks the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion.


AdC3: Proc Natl Acad Sci U S A. 2017 Jan 31;114(5):E840-E848.

Harmonic template neurons in primate auditory cortex underlying complex sound processing.

Feng L, Wang X.

Harmonicity is a fundamental element of music, speech, and animal vocalizations. How the auditory system extracts harmonic structures embedded in complex sounds and uses them to form a coherent unitary entity is not fully understood. Despite the prevalence of sounds rich in harmonic structures in our everyday hearing environment, it has remained largely unknown what neural mechanisms are used by the primate auditory cortex to extract these biologically important acoustic structures. In this study, we discovered a unique class of harmonic template neurons in the core region of auditory cortex of a highly vocal New World primate, the common marmoset (Callithrix jacchus), across the entire hearing frequency range. Marmosets have a rich vocal repertoire and a similar hearing range to that of humans. Responses of these neurons show nonlinear facilitation to harmonic complex sounds over inharmonic sounds, selectivity for particular harmonic structures beyond two-tone combinations, and sensitivity to harmonic number and spectral regularity. Our findings suggest that the harmonic template neurons in auditory cortex may play an important role in processing sounds with harmonic structures, such as animal vocalizations, human speech, and music.


AdC4: Proc Natl Acad Sci U S A. 2018 Feb 6;115(6):E1309-E1318

The eardrums move when the eyes move: A multisensory effect on the mechanics of hearing.

Gruters KG, Murphy DLK, Jenson CD, Smith DW, Shera CA, Groh JM.

nteractions between sensory pathways such as the visual and auditory systems are known to occur in the brain, but where they first occur is uncertain. Here, we show a multimodal interaction evident at the eardrum. Ear canal microphone measurements in humans (n = 19 ears in 16 subjects) and monkeys (n = 5 ears in three subjects) performing a saccadic eye movement task to visual targets indicated that the eardrum moves in conjunction with the eye movement. The eardrum motion was oscillatory and began as early as 10 ms before saccade onset in humans or with saccade onset in monkeys. These eardrum movements, which we dub eye movement-related eardrum oscillations (EMREOs), occurred in the absence of a sound stimulus. The amplitude and phase of the EMREOs depended on the direction and horizontal amplitude of the saccade. They lasted throughout the saccade and well into subsequent periods of steady fixation. We discuss the possibility that the mechanisms underlying EMREOs create eye movement-related binaural cues that may aid the brain in evaluating the relationship between visual and auditory stimulus locations as the eyes move.


AdC5: PLoS One. 2014 Jan 27;9(1):e85791

Two distinct dynamic modes subtend the detection of unexpected sounds.

King JR, Gramfort A, Schurger A, Naccache L, Dehaene S.

The brain response to auditory novelty comprises two main eeg components: an early mismatch negativity and a late P300. Whereas the former has been proposed to reflect a prediction error, the latter is often associated with working memory updating. Interestingly, these two proposals predict fundamentally different dynamics: prediction errors are thought to propagate serially through several distinct brain areas, while working memory supposes that activity is sustained over time within a stable set of brain areas. Here we test this temporal dissociation by showing how the generalization of brain activity patterns across time can characterize the dynamics of the underlying neural processes. This method is applied to magnetoencephalography (MEG) recordings acquired from healthy participants who were presented with two types of auditory novelty. Following our predictions, the results show that the mismatch evoked by a local novelty leads to the sequential recruitment of distinct and short-lived patterns of brain activity. In sharp contrast, the global novelty evoked by an unexpected sequence of five sounds elicits a sustained state of brain activity that lasts for several hundreds of milliseconds. The present results highlight how MEG combined with multivariate pattern analyses can characterize the dynamics of human cortical processes.


AdC6: Curr Biol. 2017 Mar 6;27(5):743-750.

Frogs Exploit Statistical Regularities in Noisy Acoustic Scenes to Solve Cocktail-Party-like Problems.

Lee N, Ward JL, Vélez A, Micheyl C, Bee MA.

Noise is a ubiquitous source of errors in all forms of communication [1]. Noise-induced errors in speech communication, for example, make it difficult for humans to converse in noisy social settings, a challenge aptly named the cocktail party problem [2]. Many nonhuman animals also communicate acoustically in noisy social groups and thus face biologically analogous problems [3]. However, we know little about how the perceptual systems of receivers are evolutionarily adapted to avoid the costs of noise-induced errors in communication. In this study of Cope's gray treefrog (Hyla chrysoscelis; Hylidae), we investigated whether receivers exploit a potential statistical regularity present in noisy acoustic scenes to reduce errors in signal recognition and discrimination. We developed an anatomical/physiological model of the peripheral auditory system to show that temporal correlation in amplitude fluctuations across the frequency spectrum (comodulation) [4-6] is a feature of the noise generated by large breeding choruses of sexually advertising males. In four psychophysical experiments, we investigated whether females exploit comodulation in background noise to mitigate noise-induced errors in evolutionarily critical mate-choice decisions. Subjects experienced fewer errors in recognizing conspecific calls and in selecting the calls of high-quality mates in the presence of simulated chorus noise that was comodulated. These data show unequivocally, and for the first time, that exploiting statistical regularities present in noisy acoustic scenes is an important biological strategy for solving cocktail-party-like problems in nonhuman animal communication.


AdC7: Hear Res. 2018 Dec;370:201-208

No otoacoustic evidence for a peripheral basis of absolute pitch.

McKetton L, Purcell D, Stone V, Grahn J, Bergevin C.

Absolute pitch (AP) is the ability to identify the perceived pitch of a sound without an external reference. Relatively rare, with an incidence of approximately 1/10,000, the mechanisms underlying AP are not well understood. This study examined otoacoustic emissions (OAEs) to determine if there is evidence of a peripheral (i.e., cochlear) basis for AP. Two OAE types were examined: spontaneous emissions (SOAEs) and stimulus-frequency emissions (SFOAEs). Our motivations to explore a peripheral foundation for AP were several-fold. First is the observation that pitch judgment accuracy has been reported to decrease with age due to age-dependent physiological changes cochlear biomechanics. Second is the notion that SOAEs, which are indirectly related to perception, could act as a fixed frequency reference. Third, SFOAE delays, which have been demonstrated to serve as a proxy measure for cochlear frequency selectivity, could indicate tuning differences between groups. These led us to the hypotheses that AP subjects would (relative to controls) exhibit a. greater SOAE activity and b. sharper cochlear tuning. To test these notions, measurements were made in normal-hearing control (N = 33) and AP-possessor (N = 20) populations. In short, no substantial difference in SOAE activity was found between groups, indicating no evidence for one or more strong SOAEs that could act as a fixed cue. SFOAE phase-gradient delays, measured at several different probe levels (20-50 dB SPL), also showed no significant differences between groups. This observation argues against sharper cochlear frequency selectivity in AP subjects. Taken together, these data support the prevailing view that AP mechanisms predominantly arise at a processing level in the central nervous system (CNS) at the brainstem or higher, not within the cochlea.


AdC8: Curr Biol. 2018 May 7;28(9):1405-1418

Adaptive and Selective Time Averaging of Auditory Scenes.

McWalter R, McDermott JH.

To overcome variability, estimate scene characteristics, and compress sensory input, perceptual systems pool data into statistical summaries. Despite growing evidence for statistical representations in perception, the underlying mechanisms remain poorly understood. One example of such representations occurs in auditory scenes, where background texture appears to be represented with time-averaged sound statistics. We probed the averaging mechanism using "texture steps"-textures containing subtle shifts in stimulus statistics. Although generally imperceptible, steps occurring in the previous several seconds biased texture judgments, indicative of a multi-second averaging window. Listeners seemed unable to willfully extend or restrict this window but showed signatures of longer integration times for temporally variable textures. In all cases the measured timescales were substantially longer than previously reported integration times in the auditory system. Integration also showed signs of being restricted to sound elements attributed to a common source. The results suggest an integration process that depends on stimulus characteristics, integrating over longer extents when it benefits statistical estimation of variable signals and selectively integrating stimulus components likely to have a common cause in the world. Our methodology could be naturally extended to examine statistical representations of other types of sensory signals.


AdC9: J Neurosci. 2016 Mar 9;36(10):2986-94.

Pitch-Responsive Cortical Regions in Congenital Amusia.

Norman-Haignere SV, Albouy P, Caclin A, McDermott JH, Kanwisher NG, Tillmann B.

Congenital amusia is a lifelong deficit in music perception thought to reflect an underlying impairment in the perception and memory of pitch. The neural basis of amusic impairments is actively debated. Some prior studies have suggested that amusia stems from impaired connectivity between auditory and frontal cortex. However, it remains possible that impairments in pitch coding within auditory cortex also contribute to the disorder, in part because prior studies have not measured responses from the cortical regions most implicated in pitch perception in normal individuals. We addressed this question by measuring fMRI responses in 11 subjects with amusia and 11 age- and education-matched controls to a stimulus contrast that reliably identifies pitch-responsive regions in normal individuals: harmonic tones versus frequency-matched noise. Our findings demonstrate that amusic individuals with a substantial pitch perception deficit exhibit clusters of pitch-responsive voxels that are comparable in extent, selectivity, and anatomical location to those of control participants. We discuss possible explanations for why amusics might be impaired at perceiving pitch relations despite exhibiting normal fMRI responses to pitch in their auditory cortex: (1) individual neurons within the pitch-responsive region might exhibit abnormal tuning or temporal coding not detectable with fMRI, (2) anatomical tracts that link pitch-responsive regions to other brain areas (e.g., frontal cortex) might be altered, and (3) cortical regions outside of pitch-responsive cortex might be abnormal. The ability to identify pitch-responsive regions in individual amusic subjects will make it possible to ask more precise questions about their role in amusia in future work.


AdC10: Trends Hear. 2014 Sep 9;18.

Perceptual consequences of hidden hearing loss.

Plack CJ, Barker D, Prendergast G.

Dramatic results from recent animal experiments show that noise exposure can cause a selective loss of high-threshold auditory nerve fibers without affecting absolute sensitivity permanently. This cochlear neuropathy has been described as hidden hearing loss, as it is not thought to be detectable using standard measures of audiometric threshold. It is possible that hidden hearing loss is a common condition in humans and may underlie some of the perceptual deficits experienced by people with clinically normal hearing. There is some evidence that a history of noise exposure is associated with difficulties in speech discrimination and temporal processing, even in the absence of any audiometric loss. There is also evidence that the tinnitus experienced by listeners with clinically normal hearing is associated with cochlear neuropathy, as measured using Wave I of the auditory brainstem response. To date, however, there has been no direct link made between noise exposure, cochlear neuropathy, and perceptual difficulties. Animal experiments also reveal that the aging process itself, in the absence of significant noise exposure, is associated with loss of auditory nerve fibers. Evidence from human temporal bone studies and auditory brainstem response measures suggests that this form of hidden loss is common in humans and may have perceptual consequences, in particular, regarding the coding of the temporal aspects of sounds. Hidden hearing loss is potentially a major health issue, and investigations are ongoing to identify the causes and consequences of this troubling condition.


AdC11: Proc Natl Acad Sci U S A. 2016 Mar 1;113(9):2508-13

Midbrain auditory selectivity to natural sounds.

Wohlgemuth MJ, Moss CF.

This study investigated auditory stimulus selectivity in the midbrain superior colliculus (SC) of the echolocating bat, an animal that relies on hearing to guide its orienting behaviors. Multichannel, single-unit recordings were taken across laminae of the midbrain SC of the awake, passively listening big brown bat, Eptesicus fuscus. Species-specific frequency-modulated (FM) echolocation sound sequences with dynamic spectrotemporal features served as acoustic stimuli along with artificial sound sequences matched in bandwidth, amplitude, and duration but differing in spectrotemporal structure. Neurons in dorsal sensory regions of the bat SC responded selectively to elements within the FM sound sequences, whereas neurons in ventral sensorimotor regions showed broad response profiles to natural and artificial stimuli. Moreover, a generalized linear model (GLM) constructed on responses in the dorsal SC to artificial linear FM stimuli failed to predict responses to natural sounds and vice versa, but the GLM produced accurate response predictions in ventral SC neurons. This result suggests that auditory selectivity in the dorsal extent of the bat SC arises through nonlinear mechanisms, which extract species-specific sensory information. Importantly, auditory selectivity appeared only in responses to stimuli containing the natural statistics of acoustic signals used by the bat for spatial orientation-sonar vocalizations-offering support for the hypothesis that sensory selectivity enables rapid species-specific orienting behaviors. The results of this study are the first, to our knowledge, to show auditory spectrotemporal selectivity to natural stimuli in SC neurons and serve to inform a more general understanding of mechanisms guiding sensory selectivity for natural, goal-directed orienting behaviors.


BPC1: Brain. 2002 Feb;125(Pt 2):238-51.

Congenital amusia: a group study of adults afflicted with a music-specific disorder.

Ayotte J, Peretz I, Hyde K.

The condition of congenital amusia, commonly known as tone-deafness, has been described for more than a century, but has received little empirical attention. In the present study, a research effort has been made to document in detail the behavioural manifestations of congenital amusia. A group of 11 adults, fitting stringent criteria of musical disabilities, were examined in a series of tests originally designed to assess the presence and specificity of musical disorders in brain-damaged patients. The results show that congenital amusia is related to severe deficiencies in processing pitch variations. The deficit extends to impairments in music memory and recognition as well as in singing and the ability to tap in time to music. Interestingly, the disorder appears specific to the musical domain. Congenital amusical individuals process and recognize speech, including speech prosody, common environmental sounds and human voices, as well as control subjects. Thus, the present study convincingly demonstrates the existence of congenital amusia as a new class of learning disabilities that affect musical abilities.


BPC2: Psychomusicology: Music, Mind, and Brain. 2018 Vol. 28, No. 3, 178–188

A Cross-Cultural Comparison of Tonality Perception in Japanese, Chinese, Vietnamese, Indonesian, and American Listeners

Matsunaga et al.

We investigated tonal perception of melodies from 2 cultures (Western and traditional Japanese) by 5 different cultural groups (44 Japanese, 25 Chinese, 16 Vietnamese, 18 Indonesians, and 25 U.S. citizens). Listeners rated the degree of “melodic completeness” of the final tone (a tonic vs. a nontonic) and “happiness–sadness” in the mode (major vs. minor, YOH vs. IN) of each melody. When Western melodies were presented, American and Japanese listeners responded similarly, such that they reflected implicit tonal knowledge of Western music. By contrast, the responses of Chinese, Vietnamese, and Indonesian listeners were different from those of American and Japanese listeners. When traditional Japanese melodies were presented, Japanese listeners exhibited responses that reflected implicit tonal knowledge of traditional Japanese music. American listeners also showed responses that were like the Japanese; however, the pattern of responses differed between the 2 groups. Alternatively, Chinese, Vietnamese, and Indonesian listeners exhibited different responses from the Japanese. These results show large differences between the Chinese/Vietnamese/Indonesian group and the American/Japanese group. Furthermore, the differences in responses to Western melodies between Americans and Japanese were less pronounced than that between Chinese, Vietnamese, and Indonesians. These findings imply that cultural differences in tonal perception are more diverse and distinctive than previously believed.


BPC3: Cereb Cortex. 2009 Mar;19(3):712-23.

Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity.

Moreno S, Marques C, Santos A, Santos M, Castro SL, Besson M.

We conducted a longitudinal study with 32 nonmusician children over 9 months to determine 1) whether functional differences between musician and nonmusician children reflect specific predispositions for music or result from musical training and 2) whether musical training improves nonmusical brain functions such as reading and linguistic pitch processing. Event-related brain potentials were recorded while 8-year-old children performed tasks designed to test the hypothesis that musical training improves pitch processing not only in music but also in speech. Following the first testing sessions nonmusician children were pseudorandomly assigned to music or to painting training for 6 months and were tested again after training using the same tests. After musical (but not painting) training, children showed enhanced reading and pitch discrimination abilities in speech. Remarkably, 6 months of musical training thus suffices to significantly improve behavior and to influence the development of neural processes as reflected in specific pattern of brain waves. These results reveal positive transfer from music to speech and highlight the influence of musical training. Finally, they demonstrate brain plasticity in showing that relatively short periods of training have strong consequences on the functional organization of the children's brain.


BPC4: J Neurosci. 2011 Mar 9;31(10):3843-52.

Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging.

Rogalsky C, Rong F, Saberi K, Hickok G.

Language and music exhibit similar acoustic and structural properties, and both appear to be uniquely human. Several recent studies suggest that speech and music perception recruit shared computational systems, and a common substrate in Broca's area for hierarchical processing has recently been proposed. However, this claim has not been tested by directly comparing the spatial distribution of activations to speech and music processing within subjects. In the present study, participants listened to sentences, scrambled sentences, and novel melodies. As expected, large swaths of activation for both sentences and melodies were found bilaterally in the superior temporal lobe, overlapping in portions of auditory cortex. However, substantial nonoverlap was also found: sentences elicited more ventrolateral activation, whereas the melodies elicited a more dorsomedial pattern, extending into the parietal lobe. Multivariate pattern classification analyses indicate that even within the regions of blood oxygenation level-dependent response overlap, speech and music elicit distinguishable patterns of activation. Regions involved in processing hierarchical aspects of sentence perception were identified by contrasting sentences with scrambled sentences, revealing a bilateral temporal lobe network. Music perception showed no overlap whatsoever with this network. Broca's area was not robustly activated by any stimulus type. Overall, these findings suggest that basic hierarchical processing for music and speech recruits distinct cortical networks, neither of which involves Broca's area. We suggest that previous claims are based on data from tasks that tap higher-order cognitive processes, such as working memory and/or cognitive control, which can operate in both speech and music domains.


BPC5: Neuropsychologia. 2018 Aug;117:67-74.

Seeing music: The perception of melodic 'ups and downs' modulates the spatial processing of visual stimuli.

Romero-Rivas C, Vera-Constán F, Rodríguez-Cuadrado S, Puigcerver L, Fernández-Prieto I, Navarra J.

Musical melodies have "peaks" and "valleys". Although the vertical component of pitch and music is well-known, the mechanisms underlying its mental representation still remain elusive. We show evidence regarding the importance of previous experience with melodies for crossmodal interactions to emerge. The impact of these crossmodal interactions on other perceptual and attentional processes was also studied. Melodies including two tones with different frequency (e.g., E4 and D3) were repeatedly presented during the study. These melodies could either generate strong predictions (e.g., E4-D3-E4-D3-E4-[D3]) or not (e.g., E4-D3-E4-E4-D3-[?]). After the presentation of each melody, the participants had to judge the colour of a visual stimulus that appeared in a position that was, according to the traditional vertical connotations of pitch, either congruent (e.g., high-low-high-low-[up]), incongruent (high-low-high-low-[down]) or unpredicted with respect to the melody. Behavioural and electroencephalographic responses to the visual stimuli were obtained. Congruent visual stimuli elicited faster responses at the end of the experiment than at the beginning. Additionally, incongruent visual stimuli that broke the spatial prediction generated by the melody elicited larger P3b amplitudes (reflecting 'surprise' responses). Our results suggest that the passive (but repeated) exposure to melodies elicits spatial predictions that modulate the processing of other sensory events.


BPC6: Cognition. 1999 Feb 1;70(1):27-52.

Statistical learning of tone sequences by human infants and adults.

Saffran JR1, Johnson EK, Aslin RN, Newport EL.

Previous research suggests that language learners can detect and use the statistical properties of syllable sequences to discover words in continuous speech (e.g. Aslin, R.N., Saffran, J.R., Newport, E.L., 1998. Computation of conditional probability statistics by 8-month-old infants. Psychological Science 9, 321-324; Saffran, J.R., Aslin, R.N., Newport, E.L., 1996. Statistical learning by 8-month-old infants. Science 274, 1926-1928; Saffran, J., R., Newport, E.L., Aslin, R.N., (1996). Word segmentation: the role of distributional cues. Journal of Memory and Language 35, 606-621; Saffran, J.R., Newport, E.L., Aslin, R.N., Tunick, R.A., Barrueco, S., 1997. Incidental language learning: Listening (and learning) out of the corner of your ear. Psychological Science 8, 101-195). In the present research, we asked whether this statistical learning ability is uniquely tied to linguistic materials. Subjects were exposed to continuous non-linguistic auditory sequences whose elements were organized into 'tone words'. As in our previous studies, statistical information was the only word boundary cue available to learners. Both adults and 8-month-old infants succeeded at segmenting the tone stream, with performance indistinguishable from that obtained with syllable streams. These results suggest that a learning mechanism previously shown to be involved in word segmentation can also be used to segment sequences of non-linguistic stimuli.


BPC7: Psychol Sci. 2004 Aug;15(8):511-4.

Music lessons enhance IQ

Schellenberg EG

The idea that music makes you smarter has received considerable attention from scholars and the media. The present report is the first to test this hypothesis directly with random assignment of a large sample of children (N = 144) to two different types of music lessons (keyboard or voice) or to control groups that received drama lessons or no lessons. IQ was measured before and after the lessons. Compared with children in the control groups, children in the music groups exhibited greater increases in full-scale IQ. The effect was relatively small, but it generalized across IQ subtests, index scores, and a standardized measure of academic achievement. Unexpectedly, children in the drama group exhibited substantial pre- to post-test improvements in adaptive social behavior that were not evident in the music groups.


CL1: J Acoust Soc Am. 2014 Dec;136(6):3325

Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners.

Healy EW, Yoho SE, Wang Y, Apoux F, Wang D.

Consonant recognition was assessed following extraction of speech from noise using a more efficient version of the speech-segregation algorithm described in Healy, Yoho, Wang, and Wang [(2013) J. Acoust. Soc. Am. 134, 3029-3038]. Substantial increases in recognition were observed following algorithm processing, which were significantly larger for hearing-impaired (HI) than for normal-hearing (NH) listeners in both speech-shaped noise and babble backgrounds. As observed previously for sentence recognition, older HI listeners having access to the algorithm performed as well or better than young NH listeners in conditions of identical noise. It was also found that the binary masks estimated by the algorithm transmitted speech features to listeners in a fashion highly similar to that of the ideal binary mask (IBM), suggesting that the algorithm is estimating the IBM with substantial accuracy. Further, the speech features associated with voicing, manner of articulation, and place of articulation were all transmitted with relative uniformity and at relatively high levels, indicating that the algorithm and the IBM transmit speech cues without obvious deficiency. Because the current implementation of the algorithm is much more efficient, it should be more amenable to real-time implementation in devices such as hearing aids and cochlear implants.


CL2: J Acoust Soc Am. 2017 Jun;141(6):4230

An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Healy EW, Delfarah M, Vasko JL, Carter BL, Wang D.

ndividuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. This listening situation represents a very different problem for both the human and machine listener, when compared to perceiving speech in other types of background noise. A machine learning algorithm is introduced here to address this listening situation. A deep neural network was trained to estimate the ideal ratio mask for a male target talker in the presence of a female competing talker. The monaural algorithm was found to produce sentence-intelligibility increases for hearing-impaired (HI) and normal-hearing (NH) listeners at various signal-to-noise ratios (SNRs). This benefit was largest for the HI listeners and averaged 59%-points at the least-favorable SNR, with a maximum of 87%-points. The mean intelligibility achieved by the HI listeners using the algorithm was equivalent to that of young NH listeners without processing, under conditions of identical interference. Possible reasons for the limited ability of HI listeners to perceptually segregate concurrent voices are reviewed as are possible implementation considerations for algorithms like the current one.


CL3: Trends Neurosci. 2018 Apr;41(4):174-185.

Why Do Hearing Aids Fail to Restore Normal Auditory Perception?

Lesica NA

Hearing loss is a widespread condition that is linked to declines in quality of life and mental health. Hearing aids remain the treatment of choice, but, unfortunately, even state-of-the-art devices provide only limited benefit for the perception of speech in noisy environments. While traditionally viewed primarily as a loss of sensitivity, hearing loss is also known to cause complex distortions of sound-evoked neural activity that cannot be corrected by amplification alone. This Opinion article describes the effects of hearing loss on neural activity to illustrate the reasons why current hearing aids are insufficient and to motivate the use of new technologies to explore directions for improving the next generation of devices.


CL4: Trends Hear. 2017 Jan-Dec;21:2331216517730526

Predictors of Hearing-Aid Outcomes.

Lopez-Poveda EA, Johannesen PT, Pérez-González P, Blanco JL, Kalluri S, Edwards B.

Over 360 million people worldwide suffer from disabling hearing loss. Most of them can be treated with hearing aids. Unfortunately, performance with hearing aids and the benefit obtained from using them vary widely across users. Here, we investigate the reasons for such variability. Sixty-eight hearing-aid users or candidates were fitted bilaterally with nonlinear hearing aids using standard procedures. Treatment outcome was assessed by measuring aided speech intelligibility in a time-reversed two-talker background and self-reported improvement in hearing ability. Statistical predictive models of these outcomes were obtained using linear combinations of 19 predictors, including demographic and audiological data, indicators of cochlear mechanical dysfunction and auditory temporal processing skills, hearing-aid settings, working memory capacity, and pretreatment self-perceived hearing ability. Aided intelligibility tended to be better for younger hearing-aid users with good unaided intelligibility in quiet and with good temporal processing abilities. Intelligibility tended to improve by increasing amplification for low-intensity sounds and by using more linear amplification for high-intensity sounds. Self-reported improvement in hearing ability was hard to predict but tended to be smaller for users with better working memory capacity. Indicators of cochlear mechanical dysfunction, alone or in combination with hearing settings, did not affect outcome predictions. The results may be useful for improving hearing aids and setting patients' expectations.


CL5: Front Neurosci. 2014 Oct 30;8:348.

Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech.

Lopez-Poveda EA

Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.


DP1: Atten Percept Psychophys. 2015 Apr;77(3):896-906

Auditory frequency perception adapts rapidly to the immediate past.

Alais D, Orchard-Mills E, Van der Burg E.

Frequency modulation is critical to human speech. Evidence from psychophysics, neurophysiology, and neuroimaging suggests that there are neuronal populations tuned to this property of speech. Consistent with this, extended exposure to frequency change produces direction specific aftereffects in frequency change detection. We show that this aftereffect occurs extremely rapidly, requiring only a single trial of just 100-ms duration. We demonstrate this using a long, randomized series of frequency sweeps (both upward and downward, by varying amounts) and analyzing intertrial adaptation effects. We show the point of constant frequency is shifted systematically towards the previous trial's sweep direction (i.e., a frequency sweep aftereffect). Furthermore, the perception of glide direction is also independently influenced by the glide presented two trials previously. The aftereffect is frequency tuned, as exposure to a frequency sweep from a set centered on 1,000 Hz does not influence a subsequent trial drawn from a set centered on 400 Hz. More generally, the rapidity of adaptation suggests the auditory system is constantly adapting and "tuning" itself to the most recent environmental conditions.


DP2: Front Neurosci. 2017 Jul 6;11:387

Auditory Mismatch Negativity in Response to Changes of Counter-Balanced Interaural Time and Level Differences.

Altmann CF, Ueda R, Furukawa S, Kashino M, Mima T, Fukuyama H.

Interaural time differences (ITD) and interaural level differences (ILD) both signal horizontal sound source location. To achieve a unified percept of our acoustic environment, these two cues require integration. In the present study, we tested this integration of ITD and ILD with electroencephalography (EEG) by measuring the mismatch negativity (MMN). The MMN can arise in response to spatial changes and is at least partly generated in auditory cortex. In our study, we aimed at testing for an MMN in response to stimuli with counter-balanced ITD/ILD cues. To this end, we employed a roving oddball paradigm with alternating sound sequences in two types of blocks: (a) lateralized stimuli with congruently combined ITD/ILD cues and (b) midline stimuli created by counter-balanced, incongruently combined ITD/ILD cues. We observed a significant MMN peaking at about 112-128 ms after change onset for the congruent ITD/ILD cues, for both lower (0.5 kHz) and higher carrier frequency (4 kHz). More importantly, we also observed significant MMN peaking at about 129 ms for incongruently combined ITD/ILD cues, but this effect was only detectable in the lower frequency range (0.5 kHz). There were no significant differences of the MMN responses for the two types of cue combinations (congruent/incongruent). These results suggest that-at least in the lower frequency ranges (0.5 kHz)-ITD and ILD are processed independently at the level of the MMN in auditory cortex.


DP3: J Acoust Soc Am. 2018 Nov;144(5):2983.

Development and validation of a spectro-temporal processing test for cochlear-implant listeners.

Archer-Boyd AW, Southwell RV, Deeks JM, Turner RE, Carlyon RP.

Psychophysical tests of spectro-temporal resolution may aid the evaluation of methods for improving hearing by cochlear implant (CI) listeners. Here the STRIPES (Spectro-Temporal Ripple for Investigating Processor EffectivenesS) test is described and validated. Like speech, the test requires both spectral and temporal processing to perform well. Listeners discriminate between complexes of sine sweeps which increase or decrease in frequency; difficulty is controlled by changing the stimulus spectro-temporal density. Care was taken to minimize extraneous cues, forcing listeners to perform the task only on the direction of the sweeps. Vocoder simulations with normal hearing listeners showed that the STRIPES test was sensitive to the number of channels and temporal information fidelity. An evaluation with CI listeners compared a standard processing strategy with one having very wide filters, thereby spectrally blurring the stimulus. Psychometric functions were monotonic for both strategies and five of six participants performed better with the standard strategy. An adaptive procedure revealed significant differences, all in favour of the standard strategy, at the individual listener level for six of eight CI listeners. Subsequent measures validated a faster version of the test, and showed that STRIPES could be performed by recently implanted listeners having no experience of psychophysical testing.


DP4: J Acoust Soc Am. 2018 Jul;144(1):172

Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm.

Bramslow L, Naithani G, Hafez A, Barker T, Pontoppidan NH, Virtanen T.

Hearing aid users are challenged in listening situations with noise and especially speech-on-speech situations with two or more competing voices. Specifically, the task of attending to and segregating two competing voices is particularly hard, unlike for normal-hearing listeners, as shown in a small sub-experiment. In the main experiment, the competing voices benefit of a deep neural network (DNN) based stream segregation enhancement algorithm was tested on hearing-impaired listeners. A mixture of two voices was separated using a DNN and presented to the two ears as individual streams and tested for word score. Compared to the unseparated mixture, there was a 13%-point benefit from the separation, while attending to both voices. If only one output was selected as in a traditional target-masker scenario, a larger benefit of 37%-points was found. The results agreed well with objective metrics and show that for hearing-impaired listeners, DNNs have a large potential for improving stream segregation and speech intelligibility in difficult scenarios with two equally important targets without any prior selection of a primary target stream. An even higher benefit can be obtained if the user can select the preferred target via remote control.


DP5: Hear Res. 2017 Sep;352:49-69.

Speech-in-noise perception in musicians: A review.

Coffey EBJ, Mogilever NB, Zatorre RJ.

The ability to understand speech in the presence of competing sound sources is an important neuroscience question in terms of how the nervous system solves this computational problem. It is also a critical clinical problem that disproportionally affects the elderly, children with language-related learning disorders, and those with hearing loss. Recent evidence that musicians have an advantage on this multifaceted skill has led to the suggestion that musical training might be used to improve or delay the decline of speech-in-noise (SIN) function. However, enhancements have not been universally reported, nor have the relative contributions of different bottom-up versus top-down processes, and their relation to preexisting factors been disentangled. This information that would be helpful to establish whether there is a real effect of experience, what exactly is its nature, and how future training-based interventions might target the most relevant components of cognitive processes. These questions are complicated by important differences in study design and uneven coverage of neuroimaging modality. In this review, we aim to systematize recent results from studies that have specifically looked at musician-related differences in SIN by their study design properties, to summarize the findings, and to identify knowledge gaps for future work.


DP6: Sci Adv. 2015 Nov 13;1(10):e1500677.

Does the mismatch negativity operate on a consciously accessible memory trace?

Dykstra AR, Gutschalk A.

The extent to which the contents of short-term memory are consciously accessible is a fundamental question of cognitive science. In audition, short-term memory is often studied via the mismatch negativity (MMN), a change-related component of the auditory evoked response that is elicited by violations of otherwise regular stimulus sequences. The prevailing functional view of the MMN is that it operates on preattentive and even preconscious stimulus representations. We directly examined the preconscious notion of the MMN using informational masking and magnetoencephalography. Spectrally isolated and otherwise suprathreshold auditory oddball sequences were occasionally random rendered inaudible by embedding them in random multitone masker "clouds." Despite identical stimulation/task contexts and a clear representation of all stimuli in auditory cortex, MMN was only observed when the preceding regularity (that is, the standard stream) was consciously perceived. The results call into question the preconscious interpretation of MMN and raise the possibility that it might index partial awareness in the absence of overt behavior.


DP7: Curr Biol. 2017 Feb 6;27(3):359-370.

Integer Ratio Priors on Musical Rhythm Revealed Cross-culturally by Iterated Reproduction.

Jacoby N, McDermott JH.

Probability distributions over external states (priors) are essential to the interpretation of sensory signals. Priors for cultural artifacts such as music and language remain largely uncharacterized, but likely constrain cultural transmission, because only those signals with high probability under the prior can be reliably reproduced and communicated. We developed a method to estimate priors for simple rhythms via iterated reproduction of random temporal sequences. Listeners were asked to reproduce random "seed" rhythms; their reproductions were fed back as the stimulus and over time became dominated by internal biases, such that the prior could be estimated by applying the procedure multiple times. We validated that the measured prior was consistent across the modality of reproduction and that it correctly predicted perceptual discrimination. We then measured listeners' priors over the entire space of two- and three-interval rhythms. Priors in US participants showed peaks at rhythms with simple integer ratios and were similar for musicians and non-musicians. An analogous procedure produced qualitatively different results for spoken phrases, indicating some specificity to music. Priors measured in members of a native Amazonian society were distinct from those in US participants but also featured integer ratio peaks. The results do not preclude biological constraints favoring integer ratios, but they suggest that priors on musical rhythm are substantially modulated by experience and may simply reflect the empirical distribution of rhythm that listeners encounter. The proposed method can efficiently map out a high-resolution view of biases that shape transmission and stability of simple reproducible patterns within a culture.


DP8: Neuroscience. 2018 Oct 1;389:152-160

Normal Aging Slows Spontaneous Switching in Auditory and Visual Bistability.

Kondo HM, Kochiyama T.

Age-related changes in auditory and visual perception have an impact on the quality of life. It has been debated how perceptual organization is influenced by advancing age. From the neurochemical perspective, we investigated age effects on auditory and visual bistability. In perceptual bistability, a sequence of sensory inputs induces spontaneous switching between different perceptual objects. We used different modality tasks of auditory streaming and visual plaids. Young and middle-aged participants (20-60years) were instructed to indicate by a button press whenever their perception changed from one stable state to the other. The number of perceptual switches decreased with participants' ages. We employed magnetic resonance spectroscopy to measure non-invasively concentrations of the inhibitory neurotransmitter (γ-aminobutyric acid, GABA) in the brain regions of interest. When participants were asked to voluntarily modulate their perception, the amount of effective volitional control was positively correlated with the GABA concentration in the auditory and motion-sensitive areas corresponding to each sensory modality. However, no correlation was found in the prefrontal cortex and anterior cingulate cortex. In addition, effective volitional control was reduced with advancing age. Our results suggest that sequential scene analysis in auditory and visual domains is influenced by both age-related and neurochemical factors.


DP9: Nature Human Behavior (2017)

Diversity in pitch perception revealed by task dependence

McPherson, MJ, McDermott, JH.

Pitch conveys critical information in speech, music and other natural sounds, and is conventionally defined as the perceptual correlate of a sound's fundamental frequency (F0). Although pitch is widely assumed to be subserved by a single F0 esti- mation process, real-world pitch tasks vary enormously, raising the possibility of underlying mechanistic diversity. To probe pitch mechanisms, we conducted a battery of pitch-related music and speech tasks using conventional harmonic sounds and inharmonic sounds whose frequencies lack a common F0. Some pitch-related abilities-those relying on musical interval or voice recognition-were strongly impaired by inharmonicity, suggesting a reliance on F0. However, other tasks, including those dependent on pitch contours in speech and music, were unaffected by inharmonicity, suggesting a mechanism that tracks the frequency spectrum rather than the F0. The results suggest that pitch perception is mediated by several different mechanisms, only some of which conform to traditional notions of pitch.


DP10: J Acoust Soc Am. 2017 Mar;141(3):1985.

Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners

Monaghan JJ, Goehring T, Yang X, Bolner F, Wang S, Wright MC, Bleeck S.

Machine-learning based approaches to speech enhancement have recently shown great promise for improving speech intelligibility for hearing-impaired listeners. Here, the performance of three machine-learning algorithms and one classical algorithm, Wiener filtering, was compared. Two algorithms based on neural networks were examined, one using a previously reported feature set and one using a feature set derived from an auditory model. The third machine-learning approach was a dictionary-based sparse-coding algorithm. Speech intelligibility and quality scores were obtained for participants with mild-to-moderate hearing impairments listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithms. Intelligibility and quality scores were significantly improved by each of the three machine-learning approaches, but not by the classical approach. The largest improvements for both speech intelligibility and quality were found by implementing a neural network using the feature set based on auditory modeling. Furthermore, neural network based techniques appeared more promising than dictionary-based, sparse coding in terms of performance and ease of implementation.


DP11: Neuroscience. 2018 Oct 1;389:4-18

Temporal Processing in Audition: Insights from Music.

Rajendran VG1, Teki S1, Schnupp JWH2.

Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these disparate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception. Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception of rhythmic patterns. We conclude with a brief summary and outlook for future research.


DP12: Front Psychol. 2016 Jan 5;6:1977.

Acoustic and Categorical Dissimilarity of Musical Timbre: Evidence from Asymmetries Between Acoustic and Chimeric Sounds.

Siedenburg K, Jones-Mollerup K, McAdams S.

This paper investigates the role of acoustic and categorical information in timbre dissimilarity ratings. Using a Gammatone-filterbank-based sound transformation, we created tones that were rated as less familiar than recorded tones from orchestral instruments and that were harder to associate with an unambiguous sound source (Experiment 1). A subset of transformed tones, a set of orchestral recordings, and a mixed set were then rated on pairwise dissimilarity (Experiment 2A). We observed that recorded instrument timbres clustered into subsets that distinguished timbres according to acoustic and categorical properties. For the subset of cross-category comparisons in the mixed set, we observed asymmetries in the distribution of ratings, as well as a stark decay of inter-rater agreement. These effects were replicated in a more robust within-subjects design (Experiment 2B) and cannot be explained by acoustic factors alone. We finally introduced a novel model of timbre dissimilarity based on partial least-squares regression that compared the contributions of both acoustic and categorical timbre descriptors. The best model fit (R (2) = 0.88) was achieved when both types of descriptors were taken into account. These findings are interpreted as evidence for an interplay of acoustic and categorical information in timbre dissimilarity perception.


DP13: J Acoust Soc Am. 2018 Apr;143(4):2460.

Discovering acoustic structure of novel sounds.

Stilp CE, Kiefte M, Kluender KR.

Natural sounds have substantial acoustic structure (predictability, nonrandomness) in their spectral and temporal compositions. Listeners are expected to exploit this structure to distinguish simultaneous sound sources; however, previous studies confounded acoustic structure and listening experience. Here, sensitivity to acoustic structure in novel sounds was measured in discrimination and identification tasks. Complementary signal-processing strategies independently varied relative acoustic entropy (the inverse of acoustic structure) across frequency or time. In one condition, instantaneous frequency of low-pass-filtered 300-ms random noise was rescaled to 5 kHz bandwidth and resynthesized. In another condition, the instantaneous frequency of a short gated 5-kHz noise was resampled up to 300 ms. In both cases, entropy relative to full bandwidth or full duration was a fraction of that in 300-ms noise sampled at 10 kHz. Discrimination of sounds improved with less relative entropy. Listeners identified a probe sound as a target sound (1%, 3.2%, or 10% relative entropy) that repeated amidst distractor sounds (1%, 10%, or 100% relative entropy) at 0 dB SNR. Performance depended on differences in relative entropy between targets and background. Lower-relative-entropy targets were better identified against higher-relative-entropy distractors than lower-relative-entropy distractors; higher-relative-entropy targets were better identified amidst lower-relative-entropy distractors. Results were consistent across signal-processing strategies.


DP14: Front Neurosci. 2016 Nov 24;10:490.

Long Term Memory for Noise: Evidence of Robust Encoding of Very Short Temporal Acoustic Patterns.

Viswanathan J, Rémy F, Bacon-Macé N, Thorpe SJ.

Recent research has demonstrated that humans are able to implicitly encode and retain repeating patterns in meaningless auditory noise. Our study aimed at testing the robustness of long-term implicit recognition memory for these learned patterns. Participants performed a cyclic/non-cyclic discrimination task, during which they were presented with either 1-s cyclic noises (CNs) (the two halves of the noise were identical) or 1-s plain random noises (Ns). Among CNs and Ns presented once, target CNs were implicitly presented multiple times within a block, and implicit recognition of these target CNs was tested 4 weeks later using a similar cyclic/non-cyclic discrimination task. Furthermore, robustness of implicit recognition memory was tested by presenting participants with looped (shifting the origin) and scrambled (chopping sounds into 10- and 20-ms bits before shuffling) versions of the target CNs. We found that participants had robust implicit recognition memory for learned noise patterns after 4 weeks, right from the first presentation. Additionally, this memory was remarkably resistant to acoustic transformations, such as looping and scrambling of the sounds. Finally, implicit recognition of sounds was dependent on participant's discrimination performance during learning. Our findings suggest that meaningless temporal features as short as 10 ms can be implicitly stored in long-term auditory memory. Moreover, successful encoding and storage of such fine features may vary between participants, possibly depending on individual attention and auditory discrimination abilities. Significance Statement Meaningless auditory patterns could be implicitly encoded and stored in long-term memory.Acoustic transformations of learned meaningless patterns could be implicitly recognized after 4 weeks.Implicit long-term memories can be formed for meaningless auditory features as short as 10 ms.Successful encoding and long-term implicit recognition of meaningless patterns may strongly depend on individual attention and auditory discrimination abilities.


DP15: Curr Biol. 2017 Nov 6;27(21):3237-3247.e6.

Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise

Whitton JP, Hancock KE, Shannon JM, Polley DB.

Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges.


DP16: Hum Brain Mapp. 2018 Nov;39(11):4623-4632

Context-dependent role of selective attention for change detection in multi-speaker scenes.

Starzynski C, Gutschalk A.

Disappearance of a voice or other sound source may often go unnoticed when the auditory scene is crowded. We explored the role of selective attention for this change deafness with magnetoencephalography in multi-speaker scenes. Each scene was presented two times in direct succession, and one target speaker was frequently omitted in Scene 2. When listeners were previously cued to the target speaker, activity in auditory cortex time locked to the target speaker's sound envelope was selectively enhanced in Scene 1, as was determined by a cross-correlation analysis. Moreover, the response was stronger for hit trials than for miss trials, confirming that selective attention played a role for subsequent change detection. If selective attention to the streams where the change occurred was generally required for successful change detection, neural enhancement of this stream would also be expected without cue in hit compared to miss trials. However, when listeners were not previously cued to the target, no enhanced activity for the target speaker was observed for hit trials, and there was no significant difference between hit and miss trials. These results, first, confirm a role for attention in change detection for situations where the target source is known. Second, they suggest that the omission of a speaker, or more generally an auditory stream, can alternatively be detected without selective attentional enhancement of the target stream. Several models and strategies could be envisaged for change detection in this case, including global comparison of the subsequent scenes.


MC1: Curr Biol. 2015 Aug 3;25(15):2051-6.

Human screams occupy a privileged niche in the communication soundscape.

Arnal LH, Flinker A, Kleinschmidt A, Giraud AL, Poeppel D.

Screaming is arguably one of the most relevant communication signals for survival in humans. Despite their practical relevance and their theoretical significance as innate [1] and virtually universal [2, 3] vocalizations, what makes screams a unique signal and how they are processed is not known. Here, we use acoustic analyses, psychophysical experiments, and neuroimaging to isolate those features that confer to screams their alarming nature, and we track their processing in the human brain. Using the modulation power spectrum (MPS [4, 5]), a recently developed, neurally informed characterization of sounds, we demonstrate that human screams cluster within restricted portion of the acoustic space (between ∼30 and 150 Hz modulation rates) that corresponds to a well-known perceptual attribute, roughness. In contrast to the received view that roughness is irrelevant for communication [6], our data reveal that the acoustic space occupied by the rough vocal regime is segregated from other signals, including speech, a pre-requisite to avoid false alarms in normal vocal communication. We show that roughness is present in natural alarm signals as well as in artificial alarms and that the presence of roughness in sounds boosts their detection in various tasks. Using fMRI, we show that acoustic roughness engages subcortical structures critical to rapidly appraise danger. Altogether, these data demonstrate that screams occupy a privileged acoustic niche that, being separated from other communication signals, ensures their biological and ultimately social efficiency.


MC2: J Neurosci. 2016 Sep 21;36(38):9888-95.

Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.

Crosse MJ, Di Liberto GM, Lalor EC.

Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise.


MC3: Psychol Sci. 2018 Oct;29(10):1575-1583.

Psychol Sci. 2018 Oct;29(10):1575-1583.

Holmes E, Domingo Y, Johnsrude IS.

We can recognize familiar people by their voices, and familiar talkers are more intelligible than unfamiliar talkers when competing talkers are present. However, whether the acoustic voice characteristics that permit recognition and those that benefit intelligibility are the same or different is unknown. Here, we recruited pairs of participants who had known each other for 6 months or longer and manipulated the acoustic correlates of two voice characteristics (vocal tract length and glottal pulse rate). These had different effects on explicit recognition of and the speech-intelligibility benefit realized from familiar voices. Furthermore, even when explicit recognition of familiar voices was eliminated, they were still more intelligible than unfamiliar voices-demonstrating that familiar voices do not need to be explicitly recognized to benefit intelligibility. Processing familiar-voice information appears therefore to depend on multiple, at least partially independent, systems that are recruited depending on the perceptual goal of the listener.


MC4: Neuron. 2018 May 2;98(3):630-644

A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy.

Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH.

A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.


MC5: Proc Natl Acad Sci U S A. 2018 Apr 3;115(14):E3313-E3322

Schema learning for the cocktail party problem.

Woods KJP, McDermott JH.

The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.


MC6: Proc Natl Acad Sci U S A. 2016 Apr 5;113(14):3873-8.

Spatiotemporal dynamics of auditory attention synchronize with speech.

Wöstmann M, Herrmann B, Maess B, Obleser J.

Attention plays a fundamental role in selectively processing stimuli in our environment despite distraction. Spatial attention induces increasing and decreasing power of neural alpha oscillations (8-12 Hz) in brain regions ipsilateral and contralateral to the locus of attention, respectively. This study tested whether the hemispheric lateralization of alpha power codes not just the spatial location but also the temporal structure of the stimulus. Participants attended to spoken digits presented to one ear and ignored tightly synchronized distracting digits presented to the other ear. In the magnetoencephalogram, spatial attention induced lateralization of alpha power in parietal, but notably also in auditory cortical regions. This alpha power lateralization was not maintained steadily but fluctuated in synchrony with the speech rate and lagged the time course of low-frequency (1-5 Hz) sensory synchronization. Higher amplitude of alpha power modulation at the speech rate was predictive of a listener's enhanced performance of stream-specific speech comprehension. Our findings demonstrate that alpha power lateralization is modulated in tune with the sensory input and acts as a spatiotemporal filter controlling the read-out of sensory content.


YB1: Neuron. 2016 Apr 6;90(1):191-203.

Unmasking Latent Inhibitory Connections in Human Cortex to Reveal Dormant Cortical Memories.

Barron HC, Vogels TP, Emir UE, Makin TR, O'Shea J, Clare S, Jbabdi S, Dolan RJ, Behrens TE.

Balance of cortical excitation and inhibition (EI) is thought to be disrupted in several neuropsychiatric conditions, yet it is not clear how it is maintained in the healthy human brain. When EI balance is disturbed during learning and memory in animal models, it can be restabilized via formation of inhibitory replicas of newly formed excitatory connections. Here we assess evidence for such selective inhibitory rebalancing in humans. Using fMRI repetition suppression we measure newly formed cortical associations in the human brain. We show that expression of these associations reduces over time despite persistence in behavior, consistent with inhibitory rebalancing. To test this, we modulated excitation/inhibition balance with transcranial direct current stimulation (tDCS). Using ultra-high-field (7T) MRI and spectroscopy, we show that reducing GABA allows cortical associations to be re-expressed. This suggests that in humans associative memories are stored in balanced excitatory-inhibitory ensembles that lie dormant unless latent inhibitory connections are unmasked.


YB2: Neuron. 2012 Oct 18;76(2):435-49.

Discrete neocortical dynamics predict behavioral categorization of sounds.

Bathellier B, Ushakova L, Rumpel S.

The ability to group stimuli into perceptual categories is essential for efficient interaction with the environment. Discrete dynamics that emerge in brain networks are believed to be the neuronal correlate of category formation. Observations of such dynamics have recently been made; however, it is still unresolved if they actually match perceptual categories. Using in vivo two-photon calcium imaging in the auditory cortex of mice, we show that local network activity evoked by sounds is constrained to few response modes. Transitions between response modes are characterized by an abrupt switch, indicating attractor-like, discrete dynamics. Moreover, we show that local cortical responses quantitatively predict discrimination performance and spontaneous categorization of sounds in behaving mice. Our results therefore demonstrate that local nonlinear dynamics in the auditory cortex generate spontaneous sound categories which can be selected for behavioral or perceptual decisions.


YB3: Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):9972-9977.

Top-down modulation of sensory cortex gates perceptual learning.

Caras ML, Sanes DH.

Practice sharpens our perceptual judgments, a process known as perceptual learning. Although several brain regions and neural mechanisms have been proposed to support perceptual learning, formal tests of causality are lacking. Furthermore, the temporal relationship between neural and behavioral plasticity remains uncertain. To address these issues, we recorded the activity of auditory cortical neurons as gerbils trained on a sound detection task. Training led to improvements in cortical and behavioral sensitivity that were closely matched in terms of magnitude and time course. Surprisingly, the degree of neural improvement was behaviorally gated. During task performance, cortical improvements were large and predicted behavioral outcomes. In contrast, during nontask listening sessions, cortical improvements were weak and uncorrelated with perceptual performance. Targeted reduction of auditory cortical activity during training diminished perceptual learning while leaving psychometric performance largely unaffected. Collectively, our findings suggest that training facilitates perceptual learning by strengthening both bottom-up sensory encoding and top-down modulation of auditory cortex.


YB4: Elife. 2016 Mar 4;5. pii: e12577.

The auditory representation of speech sounds in human motor cortex.

Cheung C, Hamiton LS, Johnson K, Chang EF.

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


YB5: Nat Neurosci. 2011 Jan;14(1):108-14

Auditory cortex spatial sensitivity sharpens during task performance.

Lee CC, Middlebrooks JC.

Activity in the primary auditory cortex (A1) is essential for normal sound localization behavior, but previous studies of the spatial sensitivity of neurons in A1 have found broad spatial tuning. We tested the hypothesis that spatial tuning sharpens when an animal engages in an auditory task. Cats performed a task that required evaluation of the locations of sounds and one that required active listening, but in which sound location was irrelevant. Some 26-44% of the units recorded in A1 showed substantially sharpened spatial tuning during the behavioral tasks as compared with idle conditions, with the greatest sharpening occurring during the location-relevant task. Spatial sharpening occurred on a scale of tens of seconds and could be replicated multiple times in ∼1.5-h test sessions. Sharpening resulted primarily from increased suppression of responses to sounds at least-preferred locations. That and an observed increase in latencies suggest an important role of inhibitory mechanisms.


YB6: J Neurosci. 2018 Nov 14;38(46):9955-9966.

Implicit Memory for Complex Sounds in Higher Auditory Cortex of the Ferret.

Lu K, Liu W, Zan P, David SV, Fritz JB, Shamma SA.

Responses of auditory cortical neurons encode sound features of incoming acoustic stimuli and also are shaped by stimulus context and history. Previous studies of mammalian auditory cortex have reported a variable time course for such contextual effects ranging from milliseconds to minutes. However, in secondary auditory forebrain areas of songbirds, long-term stimulus-specific neuronal habituation to acoustic stimuli can persist for much longer periods of time, ranging from hours to days. Such long-term habituation in the songbird is a form of long-term auditory memory that requires gene expression. Although such long-term habituation has been demonstrated in avian auditory forebrain, this phenomenon has not previously been described in the mammalian auditory system. Utilizing a similar version of the avian habituation paradigm, we explored whether such long-term effects of stimulus history also occur in auditory cortex of a mammalian auditory generalist, the ferret. Following repetitive presentation of novel complex sounds, we observed significant response habituation in secondary auditory cortex, but not in primary auditory cortex. This long-term habituation appeared to be independent for each novel stimulus and often lasted for at least 20 min. These effects could not be explained by simple neuronal fatigue in the auditory pathway, because time-reversed sounds induced undiminished responses similar to those elicited by completely novel sounds. A parallel set of pupillometric response measurements in the ferret revealed long-term habituation effects similar to observed long-term neural habituation, supporting the hypothesis that habituation to passively presented stimuli is correlated with implicit learning and long-term recognition of familiar sounds.


YB7: Science. 2014 Feb 28;343(6174):1006-10.

Phonetic feature encoding in human superior temporal gyrus.

Mesgarani N, Cheung C, Johnson K, Chang EF.

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.


YB8: Cereb Cortex. 2018 Dec 1;28(12):4222-4233.

Neural Encoding of Auditory Features during Music Perception and Imagery.

Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN.

Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.


YB9: J Neurosci. 2006 May 3;26(18):4970-82.

Perceptual learning directs auditory cortical map reorganization through top-down influences.

Polley DB, Steinberg EE, Merzenich MM.

The primary sensory cortex is positioned at a confluence of bottom-up dedicated sensory inputs and top-down inputs related to higher-order sensory features, attentional state, and behavioral reinforcement. We tested whether topographic map plasticity in the adult primary auditory cortex and a secondary auditory area, the suprarhinal auditory field, was controlled by the statistics of bottom-up sensory inputs or by top-down task-dependent influences. Rats were trained to attend to independent parameters, either frequency or intensity, within an identical set of auditory stimuli, allowing us to vary task demands while holding the bottom-up sensory inputs constant. We observed a clear double-dissociation in map plasticity in both cortical fields. Rats trained to attend to frequency cues exhibited an expanded representation of the target frequency range within the tonotopic map but no change in sound intensity encoding compared with controls. Rats trained to attend to intensity cues expressed an increased proportion of nonmonotonic intensity response profiles preferentially tuned to the target intensity range but no change in tonotopic map organization relative to controls. The degree of topographic map plasticity within the task-relevant stimulus dimension was correlated with the degree of perceptual learning for rats in both tasks. These data suggest that enduring receptive field plasticity in the adult auditory cortex may be shaped by task-specific top-down inputs that interact with bottom-up sensory inputs and reinforcement-based neuromodulator release. Top-down inputs might confer the selectivity necessary to modify a single feature representation without affecting other spatially organized feature representations embedded within the same neural circuitry.