[P2 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents (indiqués par leurs initiales). Les articles de BPC ne peuvent être choisis qu'à l'écrit.


AdC1: Neuroimage. 2018 Feb 1;166:60-70.

Encoding of natural timbre dimensions in human auditory cortex.

Allen EJ, Moerel M, Lage-Castellanos A, De Martino F, Formisano E, Oxenham AJ.

Timbre, or sound quality, is a crucial but poorly understood dimension of auditory perception that is important in describing speech, music, and environmental sounds. The present study investigates the cortical representation of different timbral dimensions. Encoding models have typically incorporated the physical characteristics of sounds as features when attempting to understand their neural representation with functional MRI. Here we test an encoding model that is based on five subjectively derived dimensions of timbre to predict cortical responses to natural orchestral sounds. Results show that this timbre model can outperform other models based on spectral characteristics, and can perform as well as a complex joint spectrotemporal modulation model. In cortical regions at the medial border of Heschl's gyrus, bilaterally, and regions at its posterior adjacency in the right hemisphere, the timbre model outperforms even the complex joint spectrotemporal modulation model. These findings suggest that the responses of cortical neuronal populations in auditory cortex may reflect the encoding of perceptual timbre dimensions.


AdC2: Curr Biol. 2018 Mar 5;28(5):803-809

Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

Broderick MP, Anderson AJ, Di Liberto GM, Crosse MJ, Lalor EC.

People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the N400 component of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity tracks the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion.


AdC3: Proc Natl Acad Sci U S A. 2017 Jan 31;114(5):E840-E848.

Harmonic template neurons in primate auditory cortex underlying complex sound processing.

Feng L, Wang X.

Harmonicity is a fundamental element of music, speech, and animal vocalizations. How the auditory system extracts harmonic structures embedded in complex sounds and uses them to form a coherent unitary entity is not fully understood. Despite the prevalence of sounds rich in harmonic structures in our everyday hearing environment, it has remained largely unknown what neural mechanisms are used by the primate auditory cortex to extract these biologically important acoustic structures. In this study, we discovered a unique class of harmonic template neurons in the core region of auditory cortex of a highly vocal New World primate, the common marmoset (Callithrix jacchus), across the entire hearing frequency range. Marmosets have a rich vocal repertoire and a similar hearing range to that of humans. Responses of these neurons show nonlinear facilitation to harmonic complex sounds over inharmonic sounds, selectivity for particular harmonic structures beyond two-tone combinations, and sensitivity to harmonic number and spectral regularity. Our findings suggest that the harmonic template neurons in auditory cortex may play an important role in processing sounds with harmonic structures, such as animal vocalizations, human speech, and music.


AdC4: Proc Natl Acad Sci U S A. 2018 Feb 6;115(6):E1309-E1318

The eardrums move when the eyes move: A multisensory effect on the mechanics of hearing.

Gruters KG, Murphy DLK, Jenson CD, Smith DW, Shera CA, Groh JM.

nteractions between sensory pathways such as the visual and auditory systems are known to occur in the brain, but where they first occur is uncertain. Here, we show a multimodal interaction evident at the eardrum. Ear canal microphone measurements in humans (n = 19 ears in 16 subjects) and monkeys (n = 5 ears in three subjects) performing a saccadic eye movement task to visual targets indicated that the eardrum moves in conjunction with the eye movement. The eardrum motion was oscillatory and began as early as 10 ms before saccade onset in humans or with saccade onset in monkeys. These eardrum movements, which we dub eye movement-related eardrum oscillations (EMREOs), occurred in the absence of a sound stimulus. The amplitude and phase of the EMREOs depended on the direction and horizontal amplitude of the saccade. They lasted throughout the saccade and well into subsequent periods of steady fixation. We discuss the possibility that the mechanisms underlying EMREOs create eye movement-related binaural cues that may aid the brain in evaluating the relationship between visual and auditory stimulus locations as the eyes move.


AdC5: PLoS One. 2014 Jan 27;9(1):e85791

Two distinct dynamic modes subtend the detection of unexpected sounds.

King JR, Gramfort A, Schurger A, Naccache L, Dehaene S.

The brain response to auditory novelty comprises two main eeg components: an early mismatch negativity and a late P300. Whereas the former has been proposed to reflect a prediction error, the latter is often associated with working memory updating. Interestingly, these two proposals predict fundamentally different dynamics: prediction errors are thought to propagate serially through several distinct brain areas, while working memory supposes that activity is sustained over time within a stable set of brain areas. Here we test this temporal dissociation by showing how the generalization of brain activity patterns across time can characterize the dynamics of the underlying neural processes. This method is applied to magnetoencephalography (MEG) recordings acquired from healthy participants who were presented with two types of auditory novelty. Following our predictions, the results show that the mismatch evoked by a local novelty leads to the sequential recruitment of distinct and short-lived patterns of brain activity. In sharp contrast, the global novelty evoked by an unexpected sequence of five sounds elicits a sustained state of brain activity that lasts for several hundreds of milliseconds. The present results highlight how MEG combined with multivariate pattern analyses can characterize the dynamics of human cortical processes.


AdC6: Curr Biol. 2017 Mar 6;27(5):743-750.

Frogs Exploit Statistical Regularities in Noisy Acoustic Scenes to Solve Cocktail-Party-like Problems.

Lee N, Ward JL, Vélez A, Micheyl C, Bee MA.

Noise is a ubiquitous source of errors in all forms of communication [1]. Noise-induced errors in speech communication, for example, make it difficult for humans to converse in noisy social settings, a challenge aptly named the cocktail party problem [2]. Many nonhuman animals also communicate acoustically in noisy social groups and thus face biologically analogous problems [3]. However, we know little about how the perceptual systems of receivers are evolutionarily adapted to avoid the costs of noise-induced errors in communication. In this study of Cope's gray treefrog (Hyla chrysoscelis; Hylidae), we investigated whether receivers exploit a potential statistical regularity present in noisy acoustic scenes to reduce errors in signal recognition and discrimination. We developed an anatomical/physiological model of the peripheral auditory system to show that temporal correlation in amplitude fluctuations across the frequency spectrum (comodulation) [4-6] is a feature of the noise generated by large breeding choruses of sexually advertising males. In four psychophysical experiments, we investigated whether females exploit comodulation in background noise to mitigate noise-induced errors in evolutionarily critical mate-choice decisions. Subjects experienced fewer errors in recognizing conspecific calls and in selecting the calls of high-quality mates in the presence of simulated chorus noise that was comodulated. These data show unequivocally, and for the first time, that exploiting statistical regularities present in noisy acoustic scenes is an important biological strategy for solving cocktail-party-like problems in nonhuman animal communication.


AdC7: Hear Res. 2018 Dec;370:201-208

No otoacoustic evidence for a peripheral basis of absolute pitch.

McKetton L, Purcell D, Stone V, Grahn J, Bergevin C.

Absolute pitch (AP) is the ability to identify the perceived pitch of a sound without an external reference. Relatively rare, with an incidence of approximately 1/10,000, the mechanisms underlying AP are not well understood. This study examined otoacoustic emissions (OAEs) to determine if there is evidence of a peripheral (i.e., cochlear) basis for AP. Two OAE types were examined: spontaneous emissions (SOAEs) and stimulus-frequency emissions (SFOAEs). Our motivations to explore a peripheral foundation for AP were several-fold. First is the observation that pitch judgment accuracy has been reported to decrease with age due to age-dependent physiological changes cochlear biomechanics. Second is the notion that SOAEs, which are indirectly related to perception, could act as a fixed frequency reference. Third, SFOAE delays, which have been demonstrated to serve as a proxy measure for cochlear frequency selectivity, could indicate tuning differences between groups. These led us to the hypotheses that AP subjects would (relative to controls) exhibit a. greater SOAE activity and b. sharper cochlear tuning. To test these notions, measurements were made in normal-hearing control (N = 33) and AP-possessor (N = 20) populations. In short, no substantial difference in SOAE activity was found between groups, indicating no evidence for one or more strong SOAEs that could act as a fixed cue. SFOAE phase-gradient delays, measured at several different probe levels (20-50 dB SPL), also showed no significant differences between groups. This observation argues against sharper cochlear frequency selectivity in AP subjects. Taken together, these data support the prevailing view that AP mechanisms predominantly arise at a processing level in the central nervous system (CNS) at the brainstem or higher, not within the cochlea.


AdC8: Curr Biol. 2018 May 7;28(9):1405-1418

Adaptive and Selective Time Averaging of Auditory Scenes.

McWalter R, McDermott JH.

To overcome variability, estimate scene characteristics, and compress sensory input, perceptual systems pool data into statistical summaries. Despite growing evidence for statistical representations in perception, the underlying mechanisms remain poorly understood. One example of such representations occurs in auditory scenes, where background texture appears to be represented with time-averaged sound statistics. We probed the averaging mechanism using "texture steps"-textures containing subtle shifts in stimulus statistics. Although generally imperceptible, steps occurring in the previous several seconds biased texture judgments, indicative of a multi-second averaging window. Listeners seemed unable to willfully extend or restrict this window but showed signatures of longer integration times for temporally variable textures. In all cases the measured timescales were substantially longer than previously reported integration times in the auditory system. Integration also showed signs of being restricted to sound elements attributed to a common source. The results suggest an integration process that depends on stimulus characteristics, integrating over longer extents when it benefits statistical estimation of variable signals and selectively integrating stimulus components likely to have a common cause in the world. Our methodology could be naturally extended to examine statistical representations of other types of sensory signals.


AdC9: J Neurosci. 2016 Mar 9;36(10):2986-94.

Pitch-Responsive Cortical Regions in Congenital Amusia.

Norman-Haignere SV, Albouy P, Caclin A, McDermott JH, Kanwisher NG, Tillmann B.

Congenital amusia is a lifelong deficit in music perception thought to reflect an underlying impairment in the perception and memory of pitch. The neural basis of amusic impairments is actively debated. Some prior studies have suggested that amusia stems from impaired connectivity between auditory and frontal cortex. However, it remains possible that impairments in pitch coding within auditory cortex also contribute to the disorder, in part because prior studies have not measured responses from the cortical regions most implicated in pitch perception in normal individuals. We addressed this question by measuring fMRI responses in 11 subjects with amusia and 11 age- and education-matched controls to a stimulus contrast that reliably identifies pitch-responsive regions in normal individuals: harmonic tones versus frequency-matched noise. Our findings demonstrate that amusic individuals with a substantial pitch perception deficit exhibit clusters of pitch-responsive voxels that are comparable in extent, selectivity, and anatomical location to those of control participants. We discuss possible explanations for why amusics might be impaired at perceiving pitch relations despite exhibiting normal fMRI responses to pitch in their auditory cortex: (1) individual neurons within the pitch-responsive region might exhibit abnormal tuning or temporal coding not detectable with fMRI, (2) anatomical tracts that link pitch-responsive regions to other brain areas (e.g., frontal cortex) might be altered, and (3) cortical regions outside of pitch-responsive cortex might be abnormal. The ability to identify pitch-responsive regions in individual amusic subjects will make it possible to ask more precise questions about their role in amusia in future work.


AdC10: Trends Hear. 2014 Sep 9;18.

Perceptual consequences of hidden hearing loss.

Plack CJ, Barker D, Prendergast G.

Dramatic results from recent animal experiments show that noise exposure can cause a selective loss of high-threshold auditory nerve fibers without affecting absolute sensitivity permanently. This cochlear neuropathy has been described as hidden hearing loss, as it is not thought to be detectable using standard measures of audiometric threshold. It is possible that hidden hearing loss is a common condition in humans and may underlie some of the perceptual deficits experienced by people with clinically normal hearing. There is some evidence that a history of noise exposure is associated with difficulties in speech discrimination and temporal processing, even in the absence of any audiometric loss. There is also evidence that the tinnitus experienced by listeners with clinically normal hearing is associated with cochlear neuropathy, as measured using Wave I of the auditory brainstem response. To date, however, there has been no direct link made between noise exposure, cochlear neuropathy, and perceptual difficulties. Animal experiments also reveal that the aging process itself, in the absence of significant noise exposure, is associated with loss of auditory nerve fibers. Evidence from human temporal bone studies and auditory brainstem response measures suggests that this form of hidden loss is common in humans and may have perceptual consequences, in particular, regarding the coding of the temporal aspects of sounds. Hidden hearing loss is potentially a major health issue, and investigations are ongoing to identify the causes and consequences of this troubling condition.


AdC11: Proc Natl Acad Sci U S A. 2016 Mar 1;113(9):2508-13

Midbrain auditory selectivity to natural sounds.

Wohlgemuth MJ, Moss CF.

This study investigated auditory stimulus selectivity in the midbrain superior colliculus (SC) of the echolocating bat, an animal that relies on hearing to guide its orienting behaviors. Multichannel, single-unit recordings were taken across laminae of the midbrain SC of the awake, passively listening big brown bat, Eptesicus fuscus. Species-specific frequency-modulated (FM) echolocation sound sequences with dynamic spectrotemporal features served as acoustic stimuli along with artificial sound sequences matched in bandwidth, amplitude, and duration but differing in spectrotemporal structure. Neurons in dorsal sensory regions of the bat SC responded selectively to elements within the FM sound sequences, whereas neurons in ventral sensorimotor regions showed broad response profiles to natural and artificial stimuli. Moreover, a generalized linear model (GLM) constructed on responses in the dorsal SC to artificial linear FM stimuli failed to predict responses to natural sounds and vice versa, but the GLM produced accurate response predictions in ventral SC neurons. This result suggests that auditory selectivity in the dorsal extent of the bat SC arises through nonlinear mechanisms, which extract species-specific sensory information. Importantly, auditory selectivity appeared only in responses to stimuli containing the natural statistics of acoustic signals used by the bat for spatial orientation-sonar vocalizations-offering support for the hypothesis that sensory selectivity enables rapid species-specific orienting behaviors. The results of this study are the first, to our knowledge, to show auditory spectrotemporal selectivity to natural stimuli in SC neurons and serve to inform a more general understanding of mechanisms guiding sensory selectivity for natural, goal-directed orienting behaviors.


BPC1: PLoS One. 2019 May 16;14(5):e0216874.

Music training with Démos program positively influences cognitive functions in children from low socio-economic backgrounds.

Barbaroux M, Dittinger E, Besson M

This study aimed at evaluating the impact of a classic music training program (Démos) on several aspects of the cognitive development of children from low socio-economic backgrounds. We were specifically interested in general intelligence, phonological awareness and reading abilities, and in other cognitive abilities that may be improved by music training such as auditory and visual attention, working and short-term memory and visuomotor precision. We used a longitudinal approach with children presented with standardized tests before the start and after 18 months of music training. To test for pre-to-post training improvements while discarding maturation and developmental effects, raw scores for each child and for each test were normalized relative to their age group. Results showed that Démos music training improved musicality scores, total IQ and Symbol Search scores as well as concentration abilities and reading precision. In line with previous results, these findings demonstrate the positive impact of an ecologically-valid music training program on the cognitive development of children from low socio-economic backgrounds and strongly encourage the broader implementation of such programs in disadvantaged school-settings.


BPC2: Brain Lang. 2018 Oct;185:30-37

The language of music: Common neural codes for structured sequences in music and natural language.

Chiang JN, Rosenberg MH, Bufford CA, Stephens D, Lysy A, Monti MM.

The ability to process structured sequences is a central feature of natural language but also characterizes many other domains of human cognition. In this fMRI study, we measured brain metabolic response in musicians as they generated structured and non-structured sequences in language and music. We employed a univariate and multivariate cross-classification approach to provide evidence that a common neural code underlies the production of structured sequences across the two domains. Crucially, the common substrate includes Broca's area, a region well known for processing structured sequences in language. These findings have several implications. First, they directly support the hypothesis that language and music share syntactic integration mechanisms. Second, they show that Broca's area is capable of operating supramodally across these two domains. Finally, these results dismiss the recent hypothesis that domain general processes of neighboring neural substrates explain the previously observed "overlap" between neuroimaging activations across the two domains.


BPC3: Brain Lang. 2019 Mar;190:10-15.

Musical meaning modulates word acquisition.

Fritz TH, Schütte F, Steixner A, Contier O, Obrig H, Villringer A.

Musical excerpts have been shown to have the capacity to prime the processing of target words and vice versa, strongly suggesting that music can convey concepts. However, to date no study has investigated an influence of musical semantics on novel word acquisition, thus corroborating evidence for a similarity of underlying semantic processing of music and words behaviourally. The current study investigates whether semantic content of music can assist the acquisition of novel words. Forty novel words and their German translation were visually presented to 26 participants accompanied by either semantically congruent or incongruent music. Semantic congruence between music and words was expected to increase performance in the subsequent forced-choice recognition test. Participants performed significantly better on the retention of novel words presented with semantically congruent music compared to those presented with semantically incongruent music. This provides first evidence that semantic enrichment by music during novel word learning can augment novel word acquisition. This finding may lead to novel approaches in foreign language acquisition and language rehabilitation, and further strongly supports the concept that music has a strong capacity to iconically convey meaning.


BPC4: Cortex. 2019 Apr;113:229-238.

The co-occurrence of pitch and rhythm disorders in congenital amusia.

Lagrois ME, Peretz I.

The most studied form of congenital amusia is characterized by a difficulty with detecting pitch anomalies in melodies, also referred to as pitch deafness. Here, we tested for the presence of associated deficits in rhythm processing, beat in particular, in pitch deafness. In Experiment 1, participants performed beat perception and production tasks with musical excerpts of various genres. The results show a beat finding disorder in six of the ten assessed pitch-deaf participants. In order to remove a putative interference of pitch variations with beat extraction, the same participants were tested with percussive rhythms in Experiment 2 and showed a similar impairment. Furthermore, musical pitch and beat processing abilities were correlated. These new results highlight the tight connection between melody and rhythm in music processing that can nevertheless dissociate in some individuals.


BPC5: J Neurosci. 2011 Mar 9;31(10):3843-52.

Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging.

Rogalsky C, Rong F, Saberi K, Hickok G.

Language and music exhibit similar acoustic and structural properties, and both appear to be uniquely human. Several recent studies suggest that speech and music perception recruit shared computational systems, and a common substrate in Broca's area for hierarchical processing has recently been proposed. However, this claim has not been tested by directly comparing the spatial distribution of activations to speech and music processing within subjects. In the present study, participants listened to sentences, scrambled sentences, and novel melodies. As expected, large swaths of activation for both sentences and melodies were found bilaterally in the superior temporal lobe, overlapping in portions of auditory cortex. However, substantial nonoverlap was also found: sentences elicited more ventrolateral activation, whereas the melodies elicited a more dorsomedial pattern, extending into the parietal lobe. Multivariate pattern classification analyses indicate that even within the regions of blood oxygenation level-dependent response overlap, speech and music elicit distinguishable patterns of activation. Regions involved in processing hierarchical aspects of sentence perception were identified by contrasting sentences with scrambled sentences, revealing a bilateral temporal lobe network. Music perception showed no overlap whatsoever with this network. Broca's area was not robustly activated by any stimulus type. Overall, these findings suggest that basic hierarchical processing for music and speech recruits distinct cortical networks, neither of which involves Broca's area. We suggest that previous claims are based on data from tasks that tap higher-order cognitive processes, such as working memory and/or cognitive control, which can operate in both speech and music domains.


BPC6: Hear Res. 2019 Sep 1;380:108-122.

Why and how music can be used to rehabilitate and develop speech and language skills in hearing-impaired children.

Torppa R, Huotilainen M.

This paper presents evidence for a strong connection between the development of speech and language skills and musical activities of children and adolescents with hearing impairment and/or cochlear implants. This conclusion is partially based on findings for typically hearing children and adolescents, showing better speech and language skills in children and adolescents with musical training, and importantly, showing increases of speech and language skills in children and adolescents taking part in musical training. Further, studies of hearing-impaired children show connections between musical skills, involvement in musical hobbies, and speech and language skills. Even though the field is still lacking large-scale randomised controlled trials on the effects of musical interventions on the speech and language skills of children and adolescents with hearing impairments and cochlear implants, the current evidence seems enough to urge speech therapists, music therapists, music teachers, parents, and children and adolescents with hearing impairments and/or cochlear implants to start using music for enhancing speech and language skills. For this reason, we give our recommendations on how to use music for language skill enhancement in this group.


CL1: Nat Commun. 2018 Oct 16;9(1):4298.

Hidden hearing loss selectively impairs neural adaptation to loud sound environments.

Bakay WMH, Anderson LA, Garcia-Lazaro JA, McAlpine D, Schaette R

Exposure to even a single episode of loud noise can damage synapses between cochlear hair cells and auditory nerve fibres, causing hidden hearing loss (HHL) that is not detected by audiometry. Here we investigate the effects of noise-induced HHL on functional hearing by measuring the ability of neurons in the auditory midbrain of mice to adapt to sound environments containing quiet and loud periods. Neurons from noise-exposed mice show less capacity for adaptation to loud environments, convey less information about sound intensity in those environments, and adaptation to the longer-term statistical structure of fluctuating sound environments is impaired. Adaptation comprises a cascade of both threshold and gain adaptation. Although noise exposure only impairs threshold adaptation directly, the preserved function of gain adaptation surprisingly aggravates coding deficits for loud environments. These deficits might help to understand why many individuals with seemingly normal hearing struggle to follow a conversation in background noise.


CL2: J Acoust Soc Am. 2012 May;131(5):4030-41.

Across-site patterns of modulation detection: relation to speech recognition.

Garadat SN, Zwolan TA, Pfingst BE.

The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects' speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant.


CL3: J Acoust Soc Am. 2017 Jun;141(6):4230

An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker.

Healy EW, Delfarah M, Vasko JL, Carter BL, Wang D.

Individuals with hearing impairment have particular difficulty perceptually segregating concurrent voices and understanding a talker in the presence of a competing voice. In contrast, individuals with normal hearing perform this task quite well. This listening situation represents a very different problem for both the human and machine listener, when compared to perceiving speech in other types of background noise. A machine learning algorithm is introduced here to address this listening situation. A deep neural network was trained to estimate the ideal ratio mask for a male target talker in the presence of a female competing talker. The monaural algorithm was found to produce sentence-intelligibility increases for hearing-impaired (HI) and normal-hearing (NH) listeners at various signal-to-noise ratios (SNRs). This benefit was largest for the HI listeners and averaged 59%-points at the least-favorable SNR, with a maximum of 87%-points. The mean intelligibility achieved by the HI listeners using the algorithm was equivalent to that of young NH listeners without processing, under conditions of identical interference. Possible reasons for the limited ability of HI listeners to perceptually segregate concurrent voices are reviewed as are possible implementation considerations for algorithms like the current one.


CL4: J Neurosci. 2019 Jul 10;39(28):5517-5533.

Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition.

Koumura T, Terashima H, Furukawa S.

The auditory system converts the physical properties of a sound waveform to neural activities and processes them for recognition. During the process, the tuning to amplitude modulation (AM) is successively transformed by a cascade of brain regions. To test the functional significance of the AM tuning, we conducted single-unit recording in a deep neural network (DNN) trained for natural sound recognition. We calculated the AM representation in the DNN and quantitatively compared it with those reported in previous neurophysiological studies. We found that an auditory-system-like AM tuning emerges in the optimized DNN. Better-recognizing models showed greater similarity to the auditory system. We isolated the factors forming the AM representation in the different brain regions. Because the model was not designed to reproduce any anatomical or physiological properties of the auditory system other than the cascading architecture, the observed similarity suggests that the AM tuning in the auditory system might also be an emergent property for natural sound recognition during evolution and development. SIGNIFICANCE STATEMENT This study suggests that neural tuning to amplitude modulation may be a consequence of the auditory system evolving for natural sound recognition. We modeled the function of the entire auditory system; that is, recognizing sounds from raw waveforms with as few anatomical or physiological assumptions as possible. We analyzed the model using single-unit recording, which enabled a fair comparison with neurophysiological data with as few methodological biases as possible. Interestingly, our results imply that frequency decomposition in the inner ear might not be necessary for processing amplitude modulation. This implication could not have been obtained if we had used a model that assumes frequency decomposition.


CL5: J Acoust Soc Am. 2019 Aug;146(2):1207.

Effect of age on envelope regularity discrimination.

Moore BCJ, Vinay.

The ability to discriminate irregular from regular amplitude modulation was compared for young and older adults with audiometric thresholds within the normal range for frequencies from 250 to 8000 Hz, using the envelope regularity discrimination (ERD) test. The amount of irregularity was parametrically varied and quantified by an irregularity index. The carrier frequency was 2000 Hz, the modulation rate was 8 Hz, and the baseline modulation index was 0.3. Stimuli were presented both at 80 dB sound pressure level (SPL) and at 20 dB sensation level (SL) in the presence of a threshold-equalizing noise. There was a significant effect of level, performance being better at 80 dB SPL than at 20 dB SL. There was also a significant effect of age, performance being worse for the older subjects. There was no significant interaction of level and age. The thresholds for the ERD test were not significantly correlated with absolute thresholds at the test carrier frequency of 2000 Hz, for either group, or for the two groups combined. The worse envelope regularity discrimination for the older group may be related to the age-related synaptopathy that has been established from recent studies of human temporal bones.


CL6: Hear Res. 2017 Feb;344:170-182

Evidence that hidden hearing loss underlies amplitude modulation encoding deficits in individuals with and without tinnitus.

Paul BT, Bruce IC, Roberts LE.

Damage to auditory nerve fibers that expresses with suprathreshold sounds but is hidden from the audiogram has been proposed to underlie deficits in temporal coding ability observed among individuals with otherwise normal hearing, and to be present in individuals experiencing chronic tinnitus with clinically normal audiograms. We tested whether these individuals may have hidden synaptic losses on auditory nerve fibers with low spontaneous rates of firing (low-SR fibers) that are important for coding suprathreshold sounds in noise while high-SR fibers determining threshold responses in quiet remain relatively unaffected. Tinnitus and control subjects were required to detect the presence of amplitude modulation (AM) in a 5 kHz, suprathreshold tone (a frequency in the tinnitus frequency region of the tinnitus subjects, whose audiometric thresholds were normal to 12 kHz). The AM tone was embedded within background noise intended to degrade the contribution of high-SR fibers, such that AM coding was preferentially reliant on low-SR fibers. We also recorded by electroencephalography the envelope following response (EFR, generated in the auditory midbrain) to a 5 kHz, 85 Hz AM tone presented in the same background noise, and also in quiet (both low-SR and high-SR fibers contributing to AM coding in the latter condition). Control subjects with EFRs that were comparatively resistant to the addition of background noise had better AM detection thresholds than controls whose EFRs were more affected by noise. Simulated auditory nerve responses to our stimulus conditions using a well-established peripheral model suggested that low-SR fibers were better preserved in the former cases. Tinnitus subjects had worse AM detection thresholds and reduced EFRs overall compared to controls. Simulated auditory nerve responses found that in addition to severe low-SR fiber loss, a degree of high-SR fiber loss that would not be expected to affect audiometric thresholds was needed to explain the results in tinnitus subjects. The results indicate that hidden hearing loss could be sufficient to account for impaired temporal coding in individuals with normal audiograms as well as for cases of tinnitus without audiometric hearing loss.


DP1: Nat Commun. 2019 Aug 14;10(1):3671.

The rough sound of salience enhances aversion through neural synchronisation.

Arnal LH, Kleinschmidt A, Spinelli L, Giraud AL, Mégevand P.

Being able to produce sounds that capture attention and elicit rapid reactions is the prime goal of communication. One strategy, exploited by alarm signals, consists in emitting fast but perceptible amplitude modulations in the roughness range (30-150 Hz). Here, we investigate the perceptual and neural mechanisms underlying aversion to such temporally salient sounds. By measuring subjective aversion to repetitive acoustic transients, we identify a nonlinear pattern of aversion restricted to the roughness range. Using human intracranial recordings, we show that rough sounds do not merely affect local auditory processes but instead synchronise large-scale, supramodal, salience-related networks in a steady-state, sustained manner. Rough sounds synchronise activity throughout superior temporal regions, subcortical and cortical limbic areas, and the frontal cortex, a network classically involved in aversion processing. This pattern correlates with subjective aversion in all these regions, consistent with the hypothesis that roughness enhances auditory aversion through spreading of neural synchronisation.


DP2: Eur J Neurosci. 2018 Jun;47(12):1525-1533

Reading memory formation from the eyes.

Bergt A, Urai AE, Donner TH, Schwabe L.

At any time, we are processing thousands of stimuli, but only few of them will be remembered hours or days later. Is there any way to predict which ones? Here, we tested whether the pupil response to ongoing stimuli, an indicator of physiological arousal known to be relevant for memory formation, is a reliable predictor of long-term memory for these stimuli, over at least 1 day. Pupil dilation was tracked while participants performed visual and auditory encoding tasks. Memory was tested immediately after encoding and 24 hr later. Irrespective of the encoding modality, trial-by-trial variations in pupil dilation predicted reliably which stimuli were recalled in the immediate and 24 hr-delayed tests, in particular for emotionally arousing stimuli. These results show that our eyes may provide a window into the formation of long-term memories. Furthermore, our findings underline the important role of central arousal systems in the rapid formation of memories in the brain, possibly by gating synaptic plasticity mechanisms in the neocortex.


DP3: Curr Biol. 2017 Dec 4;27(23):3643-3649

Auditory Sensitivity and Decision Criteria Oscillate at Different Frequencies Separately for the Two Ears.

Ho HT, Leung J, Burr DC, Alais D, Morrone MC.

Many behavioral measures of visual perception fluctuate continually in a rhythmic manner, reflecting the influence of endogenous brain oscillations, particularly theta (4-7 Hz) and alpha (8-12 Hz) rhythms [1-3]. However, it is unclear whether these oscillations are unique to vision or whether auditory performance also oscillates [4, 5]. Several studies report no oscillatory modulation in audition [6, 7], while those with positive findings suffer from confounds relating to neural entrainment [8-10]. Here, we used a bilateral pitch-identification task to investigate rhythmic fluctuations in auditory performance separately for the two ears and applied signal detection theory (SDT) to test for oscillations of both sensitivity and criterion (changes in decision boundary) [11, 12]. Using uncorrelated dichotic white noise to induce a phase reset of oscillations, we demonstrate that, as with vision, both auditory sensitivity and criterion showed strong oscillations over time, at different frequencies: 6 Hz (theta range) for sensitivity and 8 Hz (low alpha range) for criterion, implying distinct underlying sampling mechanisms [13]. The modulation in sensitivity in left and right ears was in antiphase, suggestive of attention-like mechanisms sampling alternatively from the two ears.


DP4: Curr Biol. 2019 Oct 7;29(19):3229-3243

Universal and Non-universal Features of Musical Pitch Perception Revealed by Singing.

Jacoby N, Undurraga EA, McPherson MJ, Valdés J, Ossandón T, McDermott JH.

Musical pitch perception is argued to result from nonmusical biological constraints and thus to have similar characteristics across cultures, but its universality remains unclear. We probed pitch representations in residents of the Bolivian Amazon-the Tsimane', who live in relative isolation from Western culture-as well as US musicians and non-musicians. Participants sang back tone sequences presented in different frequency ranges. Sung responses of Amazonian and US participants approximately replicated heard intervals on a logarithmic scale, even for tones outside the singing range. Moreover, Amazonian and US reproductions both deteriorated for high-frequency tones even though they were fully audible. But whereas US participants tended to reproduce notes an integer number of octaves above or below the heard tones, Amazonians did not, ignoring the note chroma (C, D, etc.). Chroma matching in US participants was more pronounced in US musicians than non-musicians, was not affected by feedback, and was correlated with similarity-based measures of octave equivalence as well as the ability to match the absolute f0 of a stimulus in the singing range. The results suggest the cross-cultural presence of logarithmic scales for pitch, and biological constraints on the limits of pitch, but indicate that octave equivalence may be culturally contingent, plausibly dependent on pitch representations that develop from experience with particular musical systems.


DP5: J Acoust Soc Am. 2007 Jul;122(1):418-35.

Individual differences in auditory abilities.

Kidd GR, Watson CS, Gygi B.

Performance on 19 auditory discrimination and identification tasks was measured for 340 listeners with normal hearing. Test stimuli included single tones, sequences of tones, amplitude-modulated and rippled noise, temporal gaps, speech, and environmental sounds. Principal components analysis and structural equation modeling of the data support the existence of a general auditory ability and four specific auditory abilities. The specific abilities are (1) loudness and duration (overall energy) discrimination; (2) sensitivity to temporal envelope variation; (3) identification of highly familiar sounds (speech and nonspeech); and (4) discrimination of unfamiliar simple and complex spectral and temporal patterns. Examination of Scholastic Aptitude Test (SAT) scores for a large subset of the population revealed little or no association between general or specific auditory abilities and general intellectual ability. The findings provide a basis for research to further specify the nature of the auditory abilities. Of particular interest are results suggestive of a familiar sound recognition (FSR) ability, apparently specialized for sound recognition on the basis of limited or distorted information. This FSR ability is independent of normal variation in both spectral-temporal acuity and of general intellectual ability.


DP6: Front Psychol. 2018 Sep 28;9:1590

Pitch Class and Envelope Effects in the Tritone Paradox Are Mediated by Differently Pronounced Frequency Preference Regions.

Malek S.

Shepard tones (octave complex tones) are well defined in pitch chroma but are ambiguous in pitch height. Pitch direction judgments of Shepard tones depend on the clockwise distance of the pitch classes on the pitch class circle, indicating the proximity principle in auditory perception. The tritone paradox emerges when two Shepard tones that form a tritone interval are presented successively. In this case, no proximity cue is available and judgments depend on the first tone and vary from person to person. A common explanation for the tritone paradox is the assumption of a specific pitch class comparison mechanism based on a pitch class template that is differently orientated from person to person. In contrast, psychoacoustic approaches (e.g., the Terhardt virtual pitch theory) explain it with common pitch-processing mechanisms. The present paper proposes a probabilistic threshold model, which estimates Shepard tone pitch height by a probabilistic fundamental frequency extraction. In the first processing stage, only those frequency components whose amplitudes are above specific randomly distributed threshold values are selected for further processing, and whose expected values are determined by a threshold function. The lowest of these nonfiltered components is dedicated to the pitch height. The model is designed for tone pairs and provides occurrence probabilities for descending judgments. In a pitch-matching pretest, 12 Shepard tones (generated under a cosine envelope centered at 261 Hz) were compared to pure tones, whose frequencies were adjusted by an up-down staircase method. Matched frequencies corresponded to frequency components but were ambiguous in octave position. In order to test the model, Shepard tones were generated under six cosine envelopes centered over a wide frequency range (65.41, 261, 370, 440, 523.25, 1244.51 Hz). The model predicted pitch class effects and envelope effects. Steep threshold functions caused pronounced pitch class, whereas flat threshold functions caused pronounced envelope effects. The model provides an alternative explanation to the pitch class template theory and serves as a psychoacoustic framework for the perception of Shepard tones.


DP7: Hum Brain Mapp. 2018 Nov;39(11):4623-4632

Context-dependent role of selective attention for change detection in multi-speaker scenes.

Starzynski C, Gutschalk A.

Disappearance of a voice or other sound source may often go unnoticed when the auditory scene is crowded. We explored the role of selective attention for this change deafness with magnetoencephalography in multi-speaker scenes. Each scene was presented two times in direct succession, and one target speaker was frequently omitted in Scene 2. When listeners were previously cued to the target speaker, activity in auditory cortex time locked to the target speaker's sound envelope was selectively enhanced in Scene 1, as was determined by a cross-correlation analysis. Moreover, the response was stronger for hit trials than for miss trials, confirming that selective attention played a role for subsequent change detection. If selective attention to the streams where the change occurred was generally required for successful change detection, neural enhancement of this stream would also be expected without cue in hit compared to miss trials. However, when listeners were not previously cued to the target, no enhanced activity for the target speaker was observed for hit trials, and there was no significant difference between hit and miss trials. These results, first, confirm a role for attention in change detection for situations where the target source is known. Second, they suggest that the omission of a speaker, or more generally an auditory stream, can alternatively be detected without selective attentional enhancement of the target stream. Several models and strategies could be envisaged for change detection in this case, including global comparison of the subsequent scenes.


DP8: J Cogn Neurosci. 2019 May;31(5):669-685

Evidence for Linear but Not Helical Automatic Representation of Pitch in the Human Auditory System.

Regev TI, Nelken I, Deouell LY.

The perceptual organization of pitch is frequently described as helical, with a monotonic dimension of pitch height and a circular dimension of pitch chroma, accounting for the repeating structure of the octave. Although the neural representation of pitch height is widely studied, the way in which pitch chroma representation is manifested in neural activity is currently debated. We tested the automaticity of pitch chroma processing using the MMN-an ERP component indexing automatic detection of deviations from auditory regularity. Musicians trained to classify pure or complex tones across four octaves, based on chroma-C versus G (21 participants, Experiment 1) or C versus F# (27, Experiment 2). Next, they were passively exposed to MMN protocols designed to test automatic detection of height and chroma deviations. Finally, in an attend chroma block, participants had to detect the chroma deviants in a sequence similar to the passive MMN sequence. The chroma deviant tones were accurately detected in the training and the attend chroma parts both for pure and complex tones, with a slightly better performance for complex tones. However, in the passive blocks, a significant MMN was found only to height deviations and complex tone chroma deviations, but not to pure tone chroma deviations, even for perfect performers in the active tasks. These results indicate that, although height is represented preattentively, chroma is not. Processing the musical dimension of chroma may require higher cognitive processes, such as attention and working memory.


DP9: J Acoust Soc Am. 2019 Feb;145(2):1078

Specifying the perceptual relevance of onset transients for musical instrument identification.

Siedenburg K.

Sound onsets are commonly considered to play a privileged role in the identification of musical instruments, but the underlying acoustic features remain unclear. By using sounds resynthesized with and without rapidly varying transients (not to be confused with the onset as a whole), this study set out to specify precisely the role of transients and quasi-stationary components in the perception of musical instrument sounds. In experiment 1, listeners were trained to identify ten instruments from 250 ms sounds. In a subsequent test phase, listeners identified instruments from 64 ms segments of sounds presented with or without transient components, either taken from the onset, or from the middle portion of the sounds. The omission of transient components at the onset impaired overall identification accuracy only by 6%, even though experiment 2 suggested that their omission was discriminable. Shifting the position of the gate from the onset to the middle portion of the tone impaired overall identification accuracy by 25%. Taken together, these findings confirm the prominent status of onsets in musical instrument identification, but suggest that rapidly varying transients are less indicative of instrument identity compared to the relatively slow buildup of sinusoidal components during onsets.


DP10: Neuroscience. 2018 Oct 1;389:152-160

Normal Aging Slows Spontaneous Switching in Auditory and Visual Bistability.

Kondo HM, Kochiyama T.

Age-related changes in auditory and visual perception have an impact on the quality of life. It has been debated how perceptual organization is influenced by advancing age. From the neurochemical perspective, we investigated age effects on auditory and visual bistability. In perceptual bistability, a sequence of sensory inputs induces spontaneous switching between different perceptual objects. We used different modality tasks of auditory streaming and visual plaids. Young and middle-aged participants (20-60years) were instructed to indicate by a button press whenever their perception changed from one stable state to the other. The number of perceptual switches decreased with participants' ages. We employed magnetic resonance spectroscopy to measure non-invasively concentrations of the inhibitory neurotransmitter (γ-aminobutyric acid, GABA) in the brain regions of interest. When participants were asked to voluntarily modulate their perception, the amount of effective volitional control was positively correlated with the GABA concentration in the auditory and motion-sensitive areas corresponding to each sensory modality. However, no correlation was found in the prefrontal cortex and anterior cingulate cortex. In addition, effective volitional control was reduced with advancing age. Our results suggest that sequential scene analysis in auditory and visual domains is influenced by both age-related and neurochemical factors.


DP11: Nature Human Behavior (2017)

Diversity in pitch perception revealed by task dependence

McPherson, MJ, McDermott, JH.

Pitch conveys critical information in speech, music and other natural sounds, and is conventionally defined as the perceptual correlate of a sound's fundamental frequency (F0). Although pitch is widely assumed to be subserved by a single F0 esti- mation process, real-world pitch tasks vary enormously, raising the possibility of underlying mechanistic diversity. To probe pitch mechanisms, we conducted a battery of pitch-related music and speech tasks using conventional harmonic sounds and inharmonic sounds whose frequencies lack a common F0. Some pitch-related abilities-those relying on musical interval or voice recognition-were strongly impaired by inharmonicity, suggesting a reliance on F0. However, other tasks, including those dependent on pitch contours in speech and music, were unaffected by inharmonicity, suggesting a mechanism that tracks the frequency spectrum rather than the F0. The results suggest that pitch perception is mediated by several different mechanisms, only some of which conform to traditional notions of pitch.


DP12: J Acoust Soc Am. 2018 Apr;143(4):2460.

Discovering acoustic structure of novel sounds.

Stilp CE, Kiefte M, Kluender KR.

Natural sounds have substantial acoustic structure (predictability, nonrandomness) in their spectral and temporal compositions. Listeners are expected to exploit this structure to distinguish simultaneous sound sources; however, previous studies confounded acoustic structure and listening experience. Here, sensitivity to acoustic structure in novel sounds was measured in discrimination and identification tasks. Complementary signal-processing strategies independently varied relative acoustic entropy (the inverse of acoustic structure) across frequency or time. In one condition, instantaneous frequency of low-pass-filtered 300-ms random noise was rescaled to 5 kHz bandwidth and resynthesized. In another condition, the instantaneous frequency of a short gated 5-kHz noise was resampled up to 300 ms. In both cases, entropy relative to full bandwidth or full duration was a fraction of that in 300-ms noise sampled at 10 kHz. Discrimination of sounds improved with less relative entropy. Listeners identified a probe sound as a target sound (1%, 3.2%, or 10% relative entropy) that repeated amidst distractor sounds (1%, 10%, or 100% relative entropy) at 0 dB SNR. Performance depended on differences in relative entropy between targets and background. Lower-relative-entropy targets were better identified against higher-relative-entropy distractors than lower-relative-entropy distractors; higher-relative-entropy targets were better identified amidst lower-relative-entropy distractors. Results were consistent across signal-processing strategies.


MC1: Curr Biol. 2018 Dec 17;28(24):3976-3983.

Curr Biol. 2018 Dec 17;28(24):3976-3983

Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech.

Brodbeck C, Hong LE, Simon JZ.

During speech perception, a central task of the auditory cortex is to analyze complex acoustic patterns to allow detection of the words that encode a linguistic message [1]. It is generally thought that this process includes at least one intermediate, phonetic, level of representations [2-6], localized bilaterally in the superior temporal lobe [7-9]. Phonetic representations reflect a transition from acoustic to linguistic information, classifying acoustic patterns into linguistically meaningful units, which can serve as input to mechanisms that access abstract word representations [10, 11]. While recent research has identified neural signals arising from successful recognition of individual words in continuous speech [12-15], no explicit neurophysiological signal has been found demonstrating the transition from acoustic and/or phonetic to symbolic, lexical representations. Here, we report a response reflecting the incremental integration of phonetic information for word identification, dominantly localized to the left temporal lobe. The short response latency, approximately 114 ms relative to phoneme onset, suggests that phonetic information is used for lexical processing as soon as it becomes available. Responses also tracked word boundaries, confirming previous reports of immediate lexical segmentation [16, 17]. These new results were further investigated using a cocktail-party paradigm [18, 19] in which participants listened to a mix of two talkers, attending to one and ignoring the other. Analysis indicates neural lexical processing of only the attended, but not the unattended, speech stream. Thus, while responses to acoustic features reflect attention through selective amplification of attended speech, responses consistent with a lexical processing model reveal categorically selective processing.


MC2: Nat Commun. 2017 Jun 27;8:15801

Aging affects the balance of neural entrainment and top-down neural modulation in the listening brain.

Henry MJ, Herrmann B, Kunke D, Obleser J.

Healthy aging is accompanied by listening difficulties, including decreased speech comprehension, that stem from an ill-understood combination of sensory and cognitive changes. Here, we use electroencephalography to demonstrate that auditory neural oscillations of older adults entrain less firmly and less flexibly to speech-paced (∼3 Hz) rhythms than younger adults' during attentive listening. These neural entrainment effects are distinct in magnitude and origin from the neural response to sound per se. Non-entrained parieto-occipital alpha (8-12 Hz) oscillations are enhanced in young adults, but suppressed in older participants, during attentive listening. Entrained neural phase and task-induced alpha amplitude exert opposite, complementary effects on listening performance: higher alpha amplitude is associated with reduced entrainment-driven behavioural performance modulation. Thus, alpha amplitude as a task-driven, neuro-modulatory signal can counteract the behavioural corollaries of neural entrainment. Balancing these two neural strategies may present new paths for intervention in age-related listening difficulties.


MC3: Proc Natl Acad Sci U S A. 2019 Mar 5;116(10):4671-4680.

Role of the striatum in incidental learning of sound categories.

Lim SJ, Fiez JA, Holt LL

Humans are born as universal listeners without a bias toward any particular language. However, over the first year of life, infants' perception is shaped by learning native speech categories. Acoustically different sounds-such as the same word produced by different speakers-come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain-behavior relationships related to the generalization of all categories, while in the control group these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.


MC4: Neuron. 2018 May 2;98(3):630-644

A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy.

Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH.

A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.


MC5: PLoS Comput Biol. 2018 May 29;14(5):e1006162.

Detecting change in stochastic sound sequences.

Skerritt-Davis B, Elhilali M.

Our ability to parse our acoustic environment relies on the brain's capacity to extract statistical regularities from surrounding sounds. Previous work in regularity extraction has predominantly focused on the brain's sensitivity to predictable patterns in sound sequences. However, natural sound environments are rarely completely predictable, often containing some level of randomness, yet the brain is able to effectively interpret its surroundings by extracting useful information from stochastic sounds. It has been previously shown that the brain is sensitive to the marginal lower-order statistics of sound sequences (i.e., mean and variance). In this work, we investigate the brain's sensitivity to higher-order statistics describing temporal dependencies between sound events through a series of change detection experiments, where listeners are asked to detect changes in randomness in the pitch of tone sequences. Behavioral data indicate listeners collect statistical estimates to process incoming sounds, and a perceptual model based on Bayesian inference shows a capacity in the brain to track higher-order statistics. Further analysis of individual subjects' behavior indicates an important role of perceptual constraints in listeners' ability to track these sensory statistics with high fidelity. In addition, the inference model facilitates analysis of neural electroencephalography (EEG) responses, anchoring the analysis relative to the statistics of each stochastic stimulus. This reveals both a deviance response and a change-related disruption in phase of the stimulus-locked response that follow the higher-order statistics. These results shed light on the brain's ability to process stochastic sound sequences.


MC6: Nat Commun. 2019 Jun 7;10(1):2509

Adaptation of the human auditory cortex to changing background noise.

Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N.

Speech communication in real-world environments requires adaptation to changing acoustic conditions. How the human auditory cortex adapts as a new noise source appears in or disappears from the acoustic scene remain unclear. Here, we directly measured neural activity in the auditory cortex of six human subjects as they listened to speech with abruptly changing background noises. We report rapid and selective suppression of acoustic features of noise in the neural responses. This suppression results in enhanced representation and perception of speech acoustic features. The degree of adaptation to different background noises varies across neural sites and is predictable from the tuning properties and speech specificity of the sites. Moreover, adaptation to background noise is unaffected by the attentional focus of the listener. The convergence of these neural and perceptual effects reveals the intrinsic dynamic mechanisms that enable a listener to filter out irrelevant sound sources in a changing acoustic scene.


YB1: Neuron. 2016 Apr 6;90(1):191-203.

Unmasking Latent Inhibitory Connections in Human Cortex to Reveal Dormant Cortical Memories.

Barron HC, Vogels TP, Emir UE, Makin TR, O'Shea J, Clare S, Jbabdi S, Dolan RJ, Behrens TE.

Balance of cortical excitation and inhibition (EI) is thought to be disrupted in several neuropsychiatric conditions, yet it is not clear how it is maintained in the healthy human brain. When EI balance is disturbed during learning and memory in animal models, it can be restabilized via formation of inhibitory replicas of newly formed excitatory connections. Here we assess evidence for such selective inhibitory rebalancing in humans. Using fMRI repetition suppression we measure newly formed cortical associations in the human brain. We show that expression of these associations reduces over time despite persistence in behavior, consistent with inhibitory rebalancing. To test this, we modulated excitation/inhibition balance with transcranial direct current stimulation (tDCS). Using ultra-high-field (7T) MRI and spectroscopy, we show that reducing GABA allows cortical associations to be re-expressed. This suggests that in humans associative memories are stored in balanced excitatory-inhibitory ensembles that lie dormant unless latent inhibitory connections are unmasked.


YB2: Neuron. 2012 Oct 18;76(2):435-49.

Discrete neocortical dynamics predict behavioral categorization of sounds.

Bathellier B, Ushakova L, Rumpel S.

The ability to group stimuli into perceptual categories is essential for efficient interaction with the environment. Discrete dynamics that emerge in brain networks are believed to be the neuronal correlate of category formation. Observations of such dynamics have recently been made; however, it is still unresolved if they actually match perceptual categories. Using in vivo two-photon calcium imaging in the auditory cortex of mice, we show that local network activity evoked by sounds is constrained to few response modes. Transitions between response modes are characterized by an abrupt switch, indicating attractor-like, discrete dynamics. Moreover, we show that local cortical responses quantitatively predict discrimination performance and spontaneous categorization of sounds in behaving mice. Our results therefore demonstrate that local nonlinear dynamics in the auditory cortex generate spontaneous sound categories which can be selected for behavioral or perceptual decisions.


YB3: Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):9972-9977.

Top-down modulation of sensory cortex gates perceptual learning.

Caras ML, Sanes DH.

Practice sharpens our perceptual judgments, a process known as perceptual learning. Although several brain regions and neural mechanisms have been proposed to support perceptual learning, formal tests of causality are lacking. Furthermore, the temporal relationship between neural and behavioral plasticity remains uncertain. To address these issues, we recorded the activity of auditory cortical neurons as gerbils trained on a sound detection task. Training led to improvements in cortical and behavioral sensitivity that were closely matched in terms of magnitude and time course. Surprisingly, the degree of neural improvement was behaviorally gated. During task performance, cortical improvements were large and predicted behavioral outcomes. In contrast, during nontask listening sessions, cortical improvements were weak and uncorrelated with perceptual performance. Targeted reduction of auditory cortical activity during training diminished perceptual learning while leaving psychometric performance largely unaffected. Collectively, our findings suggest that training facilitates perceptual learning by strengthening both bottom-up sensory encoding and top-down modulation of auditory cortex.


YB4: Elife. 2016 Mar 4;5. pii: e12577.

The auditory representation of speech sounds in human motor cortex.

Cheung C, Hamiton LS, Johnson K, Chang EF.

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


YB5: Nat Neurosci. 2011 Jan;14(1):108-14

Auditory cortex spatial sensitivity sharpens during task performance.

Lee CC, Middlebrooks JC.

Activity in the primary auditory cortex (A1) is essential for normal sound localization behavior, but previous studies of the spatial sensitivity of neurons in A1 have found broad spatial tuning. We tested the hypothesis that spatial tuning sharpens when an animal engages in an auditory task. Cats performed a task that required evaluation of the locations of sounds and one that required active listening, but in which sound location was irrelevant. Some 26-44% of the units recorded in A1 showed substantially sharpened spatial tuning during the behavioral tasks as compared with idle conditions, with the greatest sharpening occurring during the location-relevant task. Spatial sharpening occurred on a scale of tens of seconds and could be replicated multiple times in ∼1.5-h test sessions. Sharpening resulted primarily from increased suppression of responses to sounds at least-preferred locations. That and an observed increase in latencies suggest an important role of inhibitory mechanisms.


YB6: J Neurosci. 2018 Nov 14;38(46):9955-9966.

Implicit Memory for Complex Sounds in Higher Auditory Cortex of the Ferret.

Lu K, Liu W, Zan P, David SV, Fritz JB, Shamma SA.

Responses of auditory cortical neurons encode sound features of incoming acoustic stimuli and also are shaped by stimulus context and history. Previous studies of mammalian auditory cortex have reported a variable time course for such contextual effects ranging from milliseconds to minutes. However, in secondary auditory forebrain areas of songbirds, long-term stimulus-specific neuronal habituation to acoustic stimuli can persist for much longer periods of time, ranging from hours to days. Such long-term habituation in the songbird is a form of long-term auditory memory that requires gene expression. Although such long-term habituation has been demonstrated in avian auditory forebrain, this phenomenon has not previously been described in the mammalian auditory system. Utilizing a similar version of the avian habituation paradigm, we explored whether such long-term effects of stimulus history also occur in auditory cortex of a mammalian auditory generalist, the ferret. Following repetitive presentation of novel complex sounds, we observed significant response habituation in secondary auditory cortex, but not in primary auditory cortex. This long-term habituation appeared to be independent for each novel stimulus and often lasted for at least 20 min. These effects could not be explained by simple neuronal fatigue in the auditory pathway, because time-reversed sounds induced undiminished responses similar to those elicited by completely novel sounds. A parallel set of pupillometric response measurements in the ferret revealed long-term habituation effects similar to observed long-term neural habituation, supporting the hypothesis that habituation to passively presented stimuli is correlated with implicit learning and long-term recognition of familiar sounds.


YB7: Science. 2014 Feb 28;343(6174):1006-10.

Phonetic feature encoding in human superior temporal gyrus.

Mesgarani N, Cheung C, Johnson K, Chang EF.

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.


YB8: Cereb Cortex. 2018 Dec 1;28(12):4222-4233.

Neural Encoding of Auditory Features during Music Perception and Imagery.

Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN.

Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.


YB9: J Neurosci. 2006 May 3;26(18):4970-82.

Perceptual learning directs auditory cortical map reorganization through top-down influences.

Polley DB, Steinberg EE, Merzenich MM.

The primary sensory cortex is positioned at a confluence of bottom-up dedicated sensory inputs and top-down inputs related to higher-order sensory features, attentional state, and behavioral reinforcement. We tested whether topographic map plasticity in the adult primary auditory cortex and a secondary auditory area, the suprarhinal auditory field, was controlled by the statistics of bottom-up sensory inputs or by top-down task-dependent influences. Rats were trained to attend to independent parameters, either frequency or intensity, within an identical set of auditory stimuli, allowing us to vary task demands while holding the bottom-up sensory inputs constant. We observed a clear double-dissociation in map plasticity in both cortical fields. Rats trained to attend to frequency cues exhibited an expanded representation of the target frequency range within the tonotopic map but no change in sound intensity encoding compared with controls. Rats trained to attend to intensity cues expressed an increased proportion of nonmonotonic intensity response profiles preferentially tuned to the target intensity range but no change in tonotopic map organization relative to controls. The degree of topographic map plasticity within the task-relevant stimulus dimension was correlated with the degree of perceptual learning for rats in both tasks. These data suggest that enduring receptive field plasticity in the adult auditory cortex may be shaped by task-specific top-down inputs that interact with bottom-up sensory inputs and reinforcement-based neuromodulator release. Top-down inputs might confer the selectivity necessary to modify a single feature representation without affecting other spatially organized feature representations embedded within the same neural circuitry.