[P2 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents (indiqués par leurs initiales). Les articles de BPC ne peuvent être choisis qu'à l'écrit.


AdC1: J Assoc Res Otolaryngol. 2016 Feb;17(1):69-79.

Pitch Discrimination in Musicians and Non-Musicians: Effects of Harmonic Resolvability and Processing Effort.

Bianchi F, Santurette S, Wendt D, Dau T.

Musicians typically show enhanced pitch discrimination abilities compared to non-musicians. The present study investigated this perceptual enhancement behaviorally and objectively for resolved and unresolved complex tones to clarify whether the enhanced performance in musicians can be ascribed to increased peripheral frequency selectivity and/or to a different processing effort in performing the task. In a first experiment, pitch discrimination thresholds were obtained for harmonic complex tones with fundamental frequencies (F0s) between 100 and 500 Hz, filtered in either a low- or a high-frequency region, leading to variations in the resolvability of audible harmonics. The results showed that pitch discrimination performance in musicians was enhanced for resolved and unresolved complexes to a similar extent. Additionally, the harmonics became resolved at a similar F0 in musicians and non-musicians, suggesting similar peripheral frequency selectivity in the two groups of listeners. In a follow-up experiment, listeners' pupil dilations were measured as an indicator of the required effort in performing the same pitch discrimination task for conditions of varying resolvability and task difficulty. Pupillometry responses indicated a lower processing effort in the musicians versus the non-musicians, although the processing demand imposed by the pitch discrimination task was individually adjusted according to the behavioral thresholds. Overall, these findings indicate that the enhanced pitch discrimination abilities in musicians are unlikely to be related to higher peripheral frequency selectivity and may suggest an enhanced pitch representation at more central stages of the auditory system in musically trained listeners.


AdC2: J Assoc Res Otolaryngol. 2014 Jun;15(3):465-82.

Implications of within-fiber temporal coding for perceptual studies of F0 discrimination and discrimination of harmonic and inharmonic tone complexes.

Kale S, Micheyl C, Heinz MG.

Recent psychophysical studies suggest that normal-hearing (NH) listeners can use acoustic temporal-fine-structure (TFS) cues for accurately discriminating shifts in the fundamental frequency (F0) of complex tones, or equal shifts in all component frequencies, even when the components are peripherally unresolved. The present study quantified both envelope (ENV) and TFS cues in single auditory-nerve (AN) fiber responses (henceforth referred to as neural ENV and TFS cues) from NH chinchillas in response to harmonic and inharmonic complex tones similar to those used in recent psychophysical studies. The lowest component in the tone complex (i.e., harmonic rank N) was systematically varied from 2 to 20 to produce various resolvability conditions in chinchillas (partially resolved to completely unresolved). Neural responses to different pairs of TEST (F0 or frequency shifted) and standard or reference (REF) stimuli were used to compute shuffled cross-correlograms, from which cross-correlation coefficients representing the degree of similarity between responses were derived separately for TFS and ENV. For a given F0 shift, the dissimilarity (TEST vs. REF) was greater for neural TFS than ENV. However, this difference was stimulus-based; the sensitivities of the neural TFS and ENV metrics were equivalent for equal absolute shifts of their relevant frequencies (center component and F0, respectively). For the F0-discrimination task, both ENV and TFS cues were available and could in principle be used for task performance. However, in contrast to human performance, neural TFS cues quantified with our cross-correlation coefficients were unaffected by phase randomization, suggesting that F0 discrimination for unresolved harmonics does not depend solely on TFS cues. For the frequency-shift (harmonic-versus-inharmonic) discrimination task, neural ENV cues were not available. Neural TFS cues were available and could in principle support performance in this task; however, in contrast to human-listeners' performance, these TFS cues showed no dependence on N. We conclude that while AN-fiber responses contain TFS-related cues, which can in principle be used to discriminate changes in F0 or equal shifts in component frequencies of peripherally unresolved harmonics, performance in these two psychophysical tasks appears to be limited by other factors (e.g., central processing noise).


AdC3: J Neurosci. 2016 Jan 27;36(4):1416-28.

Functional Topography of Human Auditory Cortex.

Leaver AM, Rauschecker JP.

Functional and anatomical studies have clearly demonstrated that auditory cortex is populated by multiple subfields. However, functional characterization of those fields has been largely the domain of animal electrophysiology, limiting the extent to which human and animal research can inform each other. In this study, we used high-resolution functional magnetic resonance imaging to characterize human auditory cortical subfields using a variety of low-level acoustic features in the spectral and temporal domains. Specifically, we show that topographic gradients of frequency preference, or tonotopy, extend along two axes in human auditory cortex, thus reconciling historical accounts of a tonotopic axis oriented medial to lateral along Heschl's gyrus and more recent findings emphasizing tonotopic organization along the anterior-posterior axis. Contradictory findings regarding topographic organization according to temporal modulation rate in acoustic stimuli, or "periodotopy," are also addressed. Although isolated subregions show a preference for high rates of amplitude-modulated white noise (AMWN) in our data, large-scale "periodotopic" organization was not found. Organization by AM rate was correlated with dominant pitch percepts in AMWN in many regions. In short, our data expose early auditory cortex chiefly as a frequency analyzer, and spectral frequency, as imposed by the sensory receptor surface in the cochlea, seems to be the dominant feature governing large-scale topographic organization across human auditory cortex.


AdC4: Hear Res. 2016 Jun;336:53-62.

Frequency selectivity of the human cochlea: Suppression tuning of spontaneous otoacoustic emissions.

Manley GA, van Dijk P.

Frequency selectivity is a key functional property of the inner ear and since hearing research began, the frequency resolution of the human ear has been a central question. In contrast to animal studies, which permit invasive recording of neural activity, human studies must rely on indirect methods to determine hearing selectivity. Psychophysical studies, which used masking of a tone by other sounds, indicate a modest frequency selectivity in humans. By contrast, estimates using the phase delays of stimulus-frequency otoacoustic emissions (SFOAE) predict a remarkably high selectivity, unique among mammals. An alternative measure of cochlear frequency selectivity are suppression tuning curves of spontaneous otoacoustic emissions (SOAE). Several animal studies show that these measures are in excellent agreement with neural frequency selectivity. Here we contribute a large data set from normal-hearing young humans on suppression tuning curves (STC) of spontaneous otoacoustic emissions (SOAE). The frequency selectivities of human STC measured near threshold levels agree with the earlier, much lower, psychophysical estimates. They differ, however, from the typical patterns seen in animal auditory nerve data in that the selectivity is remarkably independent of frequency. In addition, SOAE are suppressed by higher-level tones in narrow frequency bands clearly above the main suppression frequencies. These narrow suppression bands suggest interactions between the suppressor tone and a cochlear standing wave corresponding to the SOAE frequency being suppressed. The data show that the relationship between pre-neural mechanical processing in the cochlea and neural coding at the hair-cell/auditory nerve synapse needs to be reconsidered.


AdC5: J Neurosci. 2015 Feb 4;35(5):2058-73

State-dependent population coding in primary auditory cortex.

Pachitariu M, Lyamzin DR, Sahani M, Lesica NA.

Sensory function is mediated by interactions between external stimuli and intrinsic cortical dynamics that are evident in the modulation of evoked responses by cortical state. A number of recent studies across different modalities have demonstrated that the patterns of activity in neuronal populations can vary strongly between synchronized and desynchronized cortical states, i.e., in the presence or absence of intrinsically generated up and down states. Here we investigated the impact of cortical state on the population coding of tones and speech in the primary auditory cortex (A1) of gerbils, and found that responses were qualitatively different in synchronized and desynchronized cortical states. Activity in synchronized A1 was only weakly modulated by sensory input, and the spike patterns evoked by tones and speech were unreliable and constrained to a small range of patterns. In contrast, responses to tones and speech in desynchronized A1 were temporally precise and reliable across trials, and different speech tokens evoked diverse spike patterns with extremely weak noise correlations, allowing responses to be decoded with nearly perfect accuracy. Restricting the analysis of synchronized A1 to activity within up states yielded similar results, suggesting that up states are not equivalent to brief periods of desynchronization. These findings demonstrate that the representational capacity of A1 depends strongly on cortical state, and suggest that cortical state should be considered as an explicit variable in all studies of sensory processing.


AdC6: J Neurosci. 2016 Nov 23;36(47):12010-12026.

Attenuation of Responses to Self-Generated Sounds in Auditory Cortical Neurons.

Rummell BP, Klee JL, Sigurdsson T.

Many of the sounds that we perceive are caused by our own actions, for example when speaking or moving, and must be distinguished from sounds caused by external events. Studies using macroscopic measurements of brain activity in human subjects have consistently shown that responses to self-generated sounds are attenuated in amplitude. However, the underlying manifestation of this phenomenon at the cellular level is not well understood. To address this, we recorded the activity of neurons in the auditory cortex of mice in response to sounds generated by their own behavior. We found that the responses of auditory cortical neurons to these self-generated sounds were consistently attenuated, compared with the same sounds generated independently of the animals' behavior. This effect was observed in both putative pyramidal neurons and in interneurons and was stronger in lower layers of auditory cortex. Downstream of the auditory cortex, we found that responses of hippocampal neurons to self-generated sounds were almost entirely suppressed. Responses to self-generated optogenetic stimulation of auditory thalamocortical terminals were also attenuated, suggesting a cortical contribution to this effect. Further analyses revealed that the attenuation of self-generated sounds was not simply due to the nonspecific effects of movement or behavioral state on auditory responsiveness. However, the strength of attenuation depended on the degree to which self-generated sounds were expected to occur, in a cell-type-specific manner. Together, these results reveal the cellular basis underlying attenuated responses to self-generated sounds and suggest that predictive processes contribute to this effect.


AdC7: Neuroscience. 2015 Aug 6;300:325-37.

Descending and tonotopic projection patterns from the auditory cortex to the inferior colliculus.

Straka MM, Hughes R, Lee P, Lim HH.

The inferior colliculus (IC) receives many corticofugal projections, which can mediate plastic changes such as shifts in frequency tuning or excitability of IC neurons. While the densest projections are found in the IC's external cortices, fibers originating from the primary auditory cortex (AI) have been observed throughout the IC's central nucleus (ICC), and these projections have shown to be organized tonotopically. Some studies have also found projections from other core and non-core cortical regions, though the organization and function of these projections are less known. In guinea pig, there exists a non-core ventrorostral belt (VRB) region that has primary-like properties and has often been mistaken for AI, with the clearest differentiating characteristic being VRB's longer response latencies. To better understand the auditory corticofugal descending system beyond AI, we investigated if there are projections from VRB to the ICC and if they exhibit a different projection pattern than those from AI. In this study, we performed experiments in ketamine-anesthetized guinea pigs, in which we positioned 32-site electrode arrays within AI, VRB, and ICC. We identified the monosynaptic connections between AI-to-ICC and VRB-to-ICC using an antidromic stimulation method, and we analyzed their locations across the midbrain using three-dimensional histological techniques. Compared to the corticocollicular projections to the ICC from AI, there were fewer projections to the ICC from VRB, and these projections had a weaker tonotopic organization. The majority of VRB projections were observed in the caudal-medial versus the rostral-lateral region along an isofrequency lamina of the ICC, which is in contrast to the AI projections that were scattered throughout an ICC lamina. These findings suggest that the VRB directly modulates sound information within the ascending lemniscal pathway with a different or complementary role compared to the modulatory effects of AI, which may have implications for treating hearing disorders.


AdC8: Hear Res. 2016 Dec;342:112-123.

Musicians' edge: A comparison of auditory processing, cognitive abilities and statistical learning.

Mandikal Vasuki PR, Sharma M, Demuth K, Arciuli J.

It has been hypothesized that musical expertise is associated with enhanced auditory processing and cognitive abilities. Recent research has examined the relationship between musicians' advantage and implicit statistical learning skills. In the present study, we assessed a variety of auditory processing skills, cognitive processing skills, and statistical learning (auditory and visual forms) in age-matched musicians (N = 17) and non-musicians (N = 18). Musicians had significantly better performance than non-musicians on frequency discrimination, and backward digit span. A key finding was that musicians had better auditory, but not visual, statistical learning than non-musicians. Performance on the statistical learning tasks was not correlated with performance on auditory and cognitive measures. Musicians' superior performance on auditory (but not visual) statistical learning suggests that musical expertise is associated with an enhanced ability to detect statistical regularities in auditory stimuli.


AdC9: J Neurosci. 2015 Mar 4;35(9):3815-24.

Attending to pitch information inhibits processing of pitch information: the curious case of amusia.

Zendel BR, Lagrois ME, Robitaille N, Peretz I.

In normal listeners, the tonal rules of music guide musical expectancy. In a minority of individuals, known as amusics, the processing of tonality is disordered, which results in severe musical deficits. It has been shown that the tonal rules of music are neurally encoded, but not consciously available in amusics. Previous neurophysiological studies have not explicitly controlled the level of attention in tasks where participants ignored the tonal structure of the stimuli. Here, we test whether access to tonal knowledge can be demonstrated in congenital amusia when attention is controlled. Electric brain responses were recorded while asking participants to detect an individually adjusted near-threshold click in a melody. In half the melodies, a note was inserted that violated the tonal rules of music. In a second task, participants were presented with the same melodies but were required to detect the tonal deviation. Both tasks required sustained attention, thus conscious access to the rules of tonality was manipulated. In the click-detection task, the pitch deviants evoked an early right anterior negativity (ERAN) in both groups. In the pitch-detection task, the pitch deviants evoked an ERAN and P600 in controls but not in amusics. These results indicate that pitch regularities are represented in the cortex of amusics, but are not consciously available. Moreover, performing a pitch-judgment task eliminated the ERAN in amusics, suggesting that attending to pitch information interferes with perception of pitch. We propose that an impaired top-down frontotemporal projection is responsible for this disorder.


BPC1: Philos Trans R Soc Lond B Biol Sci. 2015 Mar 19;370(1664)

Neural overlap in processing music and speech.

Peretz I, Vuvan D, Lagrois ME, Armony JL.

Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing.


BPC2: J Neurosci. 2011 Mar 9;31(10):3843-52.

Functional anatomy of language and music perception: temporal and structural factors investigated using functional magnetic resonance imaging.

Rogalsky C, Rong F, Saberi K, Hickok G.

Language and music exhibit similar acoustic and structural properties, and both appear to be uniquely human. Several recent studies suggest that speech and music perception recruit shared computational systems, and a common substrate in Broca's area for hierarchical processing has recently been proposed. However, this claim has not been tested by directly comparing the spatial distribution of activations to speech and music processing within subjects. In the present study, participants listened to sentences, scrambled sentences, and novel melodies. As expected, large swaths of activation for both sentences and melodies were found bilaterally in the superior temporal lobe, overlapping in portions of auditory cortex. However, substantial nonoverlap was also found: sentences elicited more ventrolateral activation, whereas the melodies elicited a more dorsomedial pattern, extending into the parietal lobe. Multivariate pattern classification analyses indicate that even within the regions of blood oxygenation level-dependent response overlap, speech and music elicit distinguishable patterns of activation. Regions involved in processing hierarchical aspects of sentence perception were identified by contrasting sentences with scrambled sentences, revealing a bilateral temporal lobe network. Music perception showed no overlap whatsoever with this network. Broca's area was not robustly activated by any stimulus type. Overall, these findings suggest that basic hierarchical processing for music and speech recruits distinct cortical networks, neither of which involves Broca's area. We suggest that previous claims are based on data from tasks that tap higher-order cognitive processes, such as working memory and/or cognitive control, which can operate in both speech and music domains.


BPC3: Front Hum Neurosci. 2014 May 12;8:294.

Music as a mnemonic to learn gesture sequences in normal aging and Alzheimer's disease.

Moussard A, Bigand E, Belleville S, Peretz I.

Strong links between music and motor functions suggest that music could represent an interesting aid for motor learning. The present study aims for the first time to test the potential of music to assist in the learning of sequences of gestures in normal and pathological aging. Participants with mild Alzheimer's disease (AD) and healthy older adults (controls) learned sequences of meaningless gestures that were either accompanied by music or a metronome. We also manipulated the learning procedure such that participants had to imitate the gestures to-be-memorized in synchrony with the experimenter or after the experimenter during encoding. Results show different patterns of performance for the two groups. Overall, musical accompaniment had no impact on the controls' performance but improved those of AD participants. Conversely, synchronization of gestures during learning helped controls but seemed to interfere with retention in AD. We discuss these findings regarding their relevance for a better understanding of auditory-motor memory, and we propose recommendations to maximize the mnemonic effect of music for motor sequence learning for dementia care.


BPC4: Cereb Cortex. 2009 Mar;19(3):712-23.

Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity.

Moreno S, Marques C, Santos A, Santos M, Castro SL, Besson M.

We conducted a longitudinal study with 32 nonmusician children over 9 months to determine 1) whether functional differences between musician and nonmusician children reflect specific predispositions for music or result from musical training and 2) whether musical training improves nonmusical brain functions such as reading and linguistic pitch processing. Event-related brain potentials were recorded while 8-year-old children performed tasks designed to test the hypothesis that musical training improves pitch processing not only in music but also in speech. Following the first testing sessions nonmusician children were pseudorandomly assigned to music or to painting training for 6 months and were tested again after training using the same tests. After musical (but not painting) training, children showed enhanced reading and pitch discrimination abilities in speech. Remarkably, 6 months of musical training thus suffices to significantly improve behavior and to influence the development of neural processes as reflected in specific pattern of brain waves. These results reveal positive transfer from music to speech and highlight the influence of musical training. Finally, they demonstrate brain plasticity in showing that relatively short periods of training have strong consequences on the functional organization of the children's brain.


BPC5: Cereb Cortex. 2013 Sep;23(9):2038-43.

Music training for the development of speech segmentation.

François C, Chobert J, Besson M, Schön D.

The role of music training in fostering brain plasticity and developing high cognitive skills, notably linguistic abilities, is of great interest from both a scientific and a societal perspective. Here, we report results of a longitudinal study over 2 years using both behavioral and electrophysiological measures and a test-training-retest procedure to examine the influence of music training on speech segmentation in 8-year-old children. Children were pseudo-randomly assigned to either music or painting training and were tested on their ability to extract meaningless words from a continuous flow of nonsense syllables. While no between-group differences were found before training, both behavioral and electrophysiological measures showed improved speech segmentation skills across testing sessions for the music group only. These results show that music training directly causes facilitation in speech segmentation, thereby pointing to the importance of music for speech perception and more generally for children's language development. Finally these results have strong implications for promoting the development of music-based remediation strategies for children with language-based learning impairments.


BPC6: Science. 1996 Dec 13;274(5294):1926-8.

Statistical learning by 8-month-old infants.

Saffran JR, Aslin RN, Newport EL.

Learners rely on a combination of experience-independent and experience-dependent mechanisms to extract information from the environment. Language acquisition involves both types of mechanisms, but most theorists emphasize the relative importance of experience-independent mechanisms. The present study shows that a fundamental task of language acquisition, segmentation of words from fluent speech, can be accomplished by 8-month-old infants based solely on the statistical relationships between neighboring speech sounds. Moreover, this word segmentation was based on statistical learning from only 2 minutes of exposure, suggesting that infants have access to a powerful mechanism for the computation of statistical properties of the language input.


BPC7: Percept Psychophys. 1987 Jun;41(6):519-24.

Priming of chords: spreading activation or overlapping frequency spectra?

Bharucha JJ, Stoeckig K.

A chord generates expectancies for related chords to follow, Expectancies can be studied by measuring the time to discriminate between a target chord and a mistuned foil as a function of the target's relatedness to a preceding prime chord. This priming paradigm has been employed to demonstrate that related targets are processed more quickly and are perceived to be more consonant than are unrelated targets (Bharucha & Stoeckig, 1986). The priming experiments in the present paper were designed to determine whether expectancies are generated at a cognitive level, by activation spreading through a network that represents harmonic relationships, or solely at a sensory level, by the activation offrequency-specific units. In Experiment 1, prime-target pairs shared no component tones, but related pairs had overlapping frequency spectra. In Experiment 2, all overlapping frequency components were eliminated. Priming was equally strong in both experiments. We conclude that frequency-specific repetition priming cannot account for expectancies in harmony, suggesting that activation spreads at a cognitive level of representation.


CL1: Hear Res. 2017 Feb;344:170-182.

Evidence that hidden hearing loss underlies amplitude modulation encoding deficits in individuals with and without tinnitus.

Paul BT, Bruce IC, Roberts LE.

Damage to auditory nerve fibers that expresses with suprathreshold sounds but is hidden from the audiogram has been proposed to underlie deficits in temporal coding ability observed among individuals with otherwise normal hearing, and to be present in individuals experiencing chronic tinnitus with clinically normal audiograms. We tested whether these individuals may have hidden synaptic losses on auditory nerve fibers with low spontaneous rates of firing (low-SR fibers) that are important for coding suprathreshold sounds in noise while high-SR fibers determining threshold responses in quiet remain relatively unaffected. Tinnitus and control subjects were required to detect the presence of amplitude modulation (AM) in a 5 kHz, suprathreshold tone (a frequency in the tinnitus frequency region of the tinnitus subjects, whose audiometric thresholds were normal to 12 kHz). The AM tone was embedded within background noise intended to degrade the contribution of high-SR fibers, such that AM coding was preferentially reliant on low-SR fibers. We also recorded by electroencephalography the envelope following response (EFR, generated in the auditory midbrain) to a 5 kHz, 85 Hz AM tone presented in the same background noise, and also in quiet (both low-SR and high-SR fibers contributing to AM coding in the latter condition). Control subjects with EFRs that were comparatively resistant to the addition of background noise had better AM detection thresholds than controls whose EFRs were more affected by noise. Simulated auditory nerve responses to our stimulus conditions using a well-established peripheral model suggested that low-SR fibers were better preserved in the former cases. Tinnitus subjects had worse AM detection thresholds and reduced EFRs overall compared to controls. Simulated auditory nerve responses found that in addition to severe low-SR fiber loss, a degree of high-SR fiber loss that would not be expected to affect audiometric thresholds was needed to explain the results in tinnitus subjects. The results indicate that hidden hearing loss could be sufficient to account for impaired temporal coding in individuals with normal audiograms as well as for cases of tinnitus without audiometric hearing loss.


CL2: Nat Neurosci. 2012 Oct;15(10):1362-4

Diminished temporal coding with sensorineural hearing loss emerges in background noise.

Henry KS, Heinz MG.

Behavioral studies in humans suggest that sensorineural hearing loss (SNHL) decreases sensitivity to the temporal structure of sound, but neurophysiological studies in mammals provide little evidence for diminished temporal coding. We found that SNHL in chinchillas degraded peripheral temporal coding in background noise substantially more than in quiet. These results resolve discrepancies between previous studies and help to explain why perceptual difficulties in hearing-impaired listeners often emerge in noisy situations.


CL3: J Neurosci. 2016 Feb 17;36(7):2227-37

Distorted Tonotopic Coding of Temporal Envelope and Fine Structure with Noise-Induced Hearing Loss.

Henry KS, Kale S, Heinz MG.

People with cochlear hearing loss have substantial difficulty understanding speech in real-world listening environments (e.g., restaurants), even with amplification from a modern digital hearing aid. Unfortunately, a disconnect remains between human perceptual studies implicating diminished sensitivity to fast acoustic temporal fine structure (TFS) and animal studies showing minimal changes in neural coding of TFS or slower envelope (ENV) structure. Here, we used general system-identification (Wiener kernel) analyses of chinchilla auditory nerve fiber responses to Gaussian noise to reveal pronounced distortions in tonotopic coding of TFS and ENV following permanent, noise-induced hearing loss. In basal fibers with characteristic frequencies (CFs) >1.5 kHz, hearing loss introduced robust nontonotopic coding (i.e., at the wrong cochlear place) of low-frequency TFS, while ENV responses typically remained at CF. As a consequence, the highest dominant frequency of TFS coding in response to Gaussian noise was 2.4 kHz in noise-overexposed fibers compared with 4.5 kHz in control fibers. Coding of ENV also became nontonotopic in more pronounced cases of cochlear damage. In apical fibers, more classical hearing-loss effects were observed, i.e., broadened tuning without a significant shift in best frequency. Because these distortions and dissociations of TFS/ENV disrupt tonotopicity, a fundamental principle of auditory processing necessary for robust signal coding in background noise, these results have important implications for understanding communication difficulties faced by people with hearing loss. Further, hearing aids may benefit from distinct amplification strategies for apical and basal cochlear regions to address fundamentally different coding deficits.


CL4: Trends Hear. 2016 Sep 7;20. pii: 2331216516641055.

The Influence of Cochlear Mechanical Dysfunction, Temporal Processing Deficits, and Age on the Intelligibility of Audible Speech in Noise for Hearing-Impaired Listeners.

Johannesen PT, Pérez-González P, Kalluri S, Blanco JL, Lopez-Poveda EA.

The aim of this study was to assess the relative importance of cochlear mechanical dysfunction, temporal processing deficits, and age on the ability of hearing-impaired listeners to understand speech in noisy backgrounds. Sixty-eight listeners took part in the study. They were provided with linear, frequency-specific amplification to compensate for their audiometric losses, and intelligibility was assessed for speech-shaped noise (SSN) and a time-reversed two-talker masker (R2TM). Behavioral estimates of cochlear gain loss and residual compression were available from a previous study and were used as indicators of cochlear mechanical dysfunction. Temporal processing abilities were assessed using frequency modulation detection thresholds. Age, audiometric thresholds, and the difference between audiometric threshold and cochlear gain loss were also included in the analyses. Stepwise multiple linear regression models were used to assess the relative importance of the various factors for intelligibility. Results showed that (a) cochlear gain loss was unrelated to intelligibility, (b) residual cochlear compression was related to intelligibility in SSN but not in a R2TM, (c) temporal processing was strongly related to intelligibility in a R2TM and much less so in SSN, and (d) age per se impaired intelligibility. In summary, all factors affected intelligibility, but their relative importance varied across maskers.


CL5: J Assoc Res Otolaryngol. 2015 Dec;16(6):727-45

Towards a Diagnosis of Cochlear Neuropathy with Envelope Following Responses.

Shaheen LA, Valero MD, Liberman MC.

Listeners with normal audiometric thresholds can still have suprathreshold deficits, for example, in the ability to discriminate sounds in complex acoustic scenes. One likely source of these deficits is cochlear neuropathy, a loss of auditory nerve (AN) fibers without hair cell damage, which can occur due to both aging and moderate acoustic overexposure. Since neuropathy can affect up to 50 % of AN fibers, its impact on suprathreshold hearing is likely profound, but progress is hindered by lack of a robust non-invasive test of neuropathy in humans. Reduction of suprathreshold auditory brainstem responses (ABRs) can be used to quantify neuropathy in inbred mice. However, ABR amplitudes are highly variable in humans, and thus more challenging to use. Since noise-induced neuropathy is selective for AN fibers with high thresholds, and because phase locking to temporal envelopes is particularly strong in these fibers, the envelope following response (EFR) might be a more robust measure. We compared EFRs to sinusoidally amplitude-modulated tones and ABRs to tone-pips in mice following a neuropathic noise exposure. EFR amplitude, EFR phase-locking value, and ABR amplitude were all reduced in noise-exposed mice. However, the changes in EFRs were more robust: the variance was smaller, thus inter-group differences were clearer. Optimum detection of neuropathy was achieved with high modulation frequencies and moderate levels. Analysis of group delays was used to confirm that the AN population was dominating the responses at these high modulation frequencies. Application of these principles in clinical testing can improve the differential diagnosis of sensorineural hearing loss.


DP1: Hear Res. 2000 Nov;149(1-2):24-32.

An auditory negative after-image as a human model of tinnitus.

Norena A, Micheyl C, Chery-Croze S.

The Zwicker tone (ZT) is an auditory after-image, i.e. a tonal sensation that occurs following the presentation of notched noise. In the present study, the hypothesis that neural lateral inhibition is involved in the generation of this auditory illusion was investigated in humans through differences in perceptual detection thresholds measured following broadband noise, notched noise, and low-pass noise stimulation. The detection thresholds were measured using probe tones at several frequencies, within as well as outside the suppressed frequency range of the notched noise, and below as well as above the corner frequency of the low-pass noise. Thresholds measured after broadband noise using a sequence of four 130-ms probe tones (with a 130-ms inter-burst interval) proved to be significantly smaller that those measured using the same probe tones after notched noise at frequencies falling within the notch, but larger for frequencies on the outer edges of the noise. Thresholds measured following low-pass noise using the same sequence of probe tones were found to be smaller at frequencies slightly above the corner, but larger at lower, neighboring frequencies. This pattern of results is consistent with the hypothesis that the changes in auditory sensitivity induced by stimuli containing sharp spectral contrasts reflect lateral inhibition processes in the auditory system. The potential implications of these findings for the understanding of the mechanisms underlying the generation of auditory illusions like the ZT or tinnitus are discussed.


DP2: Hear Res. 1996 Oct;100(1-2):171-80.

Auditory enhancement at the absolute threshold of hearing and its relationship to the Zwicker tone.

Wiegrebe L, Kössl M, Schmidt S.

Auditory enhancement describes an improvement in the detection of a tonal signal in a broad-band masker with a spectral gap at the signal frequency if the signal is delayed in its onset relative to the masker. This auditory enhancement may be based on an increase of the effective signal level instead of a decline in the effective masker level. In order to evaluate whether this signal enhancement also exists at the threshold of hearing, we measured the absolute threshold for pure-tone pulses of different frequencies with and without preceding band-rejected noise. Such noise also causes the sensation of the Zwicker tone-a faint pure tone lasting for a few seconds immediately after the noise presentation. The pitch of this sensation is a complex function of the noise parameters but always lies at a frequency within the rejected band. During the Zwicker tone sensation, auditory sensitivity for tone pulses at frequencies adjacent to the Zwicker tone was improved by up to 13 dB instead of being reduced which might be expected due to the presence of the simultaneously audible Zwicker tone. The failure to influence this threshold shift with low-frequency tones and measurements of the ear's acoustical response indicate that this threshold improvement may be produced through neuronal disinhibition rather than through a release from mechanical suppression in the cochlea.


DP3: Nature Human Behavior (2017)

Diversity in pitch perception revealed by task dependence

McPherson, MJ, McDermott, JH.

Pitch conveys critical information in speech, music and other natural sounds, and is conventionally defined as the perceptual correlate of a sound's fundamental frequency (F0). Although pitch is widely assumed to be subserved by a single F0 esti- mation process, real-world pitch tasks vary enormously, raising the possibility of underlying mechanistic diversity. To probe pitch mechanisms, we conducted a battery of pitch-related music and speech tasks using conventional harmonic sounds and inharmonic sounds whose frequencies lack a common F0. Some pitch-related abilities-those relying on musical interval or voice recognition-were strongly impaired by inharmonicity, suggesting a reliance on F0. However, other tasks, including those dependent on pitch contours in speech and music, were unaffected by inharmonicity, suggesting a mechanism that tracks the frequency spectrum rather than the F0. The results suggest that pitch perception is mediated by several different mechanisms, only some of which conform to traditional notions of pitch.


DP4: J Assoc Res Otolaryngol. 2017 Dec;18(6):789-802.

Vocoder Simulations Explain Complex Pitch Perception Limitations Experienced by Cochlear Implant Users.

Mehta AH, Oxenham AJ.

Pitch plays a crucial role in speech and music, but is highly degraded for people with cochlear implants, leading to severe communication challenges in noisy environments. Pitch is determined primarily by the first few spectrally resolved harmonics of a tone. In implants, access to this pitch is limited by poor spectral resolution, due to the limited number of channels and interactions between adjacent channels. Here we used noise-vocoder simulations to explore how many channels, and how little channel interaction, are required to elicit pitch. Results suggest that two to four times the number of channels are needed, along with interactions reduced by an order of magnitude, than available in current devices. These new constraints not only provide insights into the basic mechanisms of pitch coding in normal hearing but also suggest that spectrally based complex pitch is unlikely to be generated in implant users without significant changes in the method or site of stimulation.


DP5: Front Psychol. 2017 Apr 13;8:587.

Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification.

Thoret E, Depalle P, McAdams S.

The ability of a listener to recognize sound sources, and in particular musical instruments from the sounds they produce, raises the question of determining the acoustical information used to achieve such a task. It is now well known that the shapes of the temporal and spectral envelopes are crucial to the recognition of a musical instrument. More recently, Modulation Power Spectra (MPS) have been shown to be a representation that potentially explains the perception of musical instrument sounds. Nevertheless, the question of which specific regions of this representation characterize a musical instrument is still open. An identification task was applied to two subsets of musical instruments: tuba, trombone, cello, saxophone, and clarinet on the one hand, and marimba, vibraphone, guitar, harp, and viola pizzicato on the other. The sounds were processed with filtered spectrotemporal modulations with 2D Gaussian windows. The most relevant regions of this representation for instrument identification were determined for each instrument and reveal the regions essential for their identification. The method used here is based on a molecular approach, the so-called bubbles method. Globally, the instruments were correctly identified and the lower values of spectrotemporal modulations are the most important regions of the MPS for recognizing instruments. Interestingly, instruments that were confused with each other led to non-overlapping regions and were confused when they were filtered in the most salient region of the other instrument. These results suggest that musical instrument timbres are characterized by specific spectrotemporal modulations, information which could contribute to music information retrieval tasks such as automatic source recognition.


DP6: Front Neurosci. 2017 Jul 6;11:387

Auditory Mismatch Negativity in Response to Changes of Counter-Balanced Interaural Time and Level Differences.

Altmann CF, Ueda R, Furukawa S, Kashino M, Mima T, Fukuyama H.

Interaural time differences (ITD) and interaural level differences (ILD) both signal horizontal sound source location. To achieve a unified percept of our acoustic environment, these two cues require integration. In the present study, we tested this integration of ITD and ILD with electroencephalography (EEG) by measuring the mismatch negativity (MMN). The MMN can arise in response to spatial changes and is at least partly generated in auditory cortex. In our study, we aimed at testing for an MMN in response to stimuli with counter-balanced ITD/ILD cues. To this end, we employed a roving oddball paradigm with alternating sound sequences in two types of blocks: (a) lateralized stimuli with congruently combined ITD/ILD cues and (b) midline stimuli created by counter-balanced, incongruently combined ITD/ILD cues. We observed a significant MMN peaking at about 112-128 ms after change onset for the congruent ITD/ILD cues, for both lower (0.5 kHz) and higher carrier frequency (4 kHz). More importantly, we also observed significant MMN peaking at about 129 ms for incongruently combined ITD/ILD cues, but this effect was only detectable in the lower frequency range (0.5 kHz). There were no significant differences of the MMN responses for the two types of cue combinations (congruent/incongruent). These results suggest that-at least in the lower frequency ranges (0.5 kHz)-ITD and ILD are processed independently at the level of the MMN in auditory cortex.


DP7: Sci Adv. 2015 Nov 13;1(10):e1500677.

Does the mismatch negativity operate on a consciously accessible memory trace?

Dykstra AR, Gutschalk A.

The extent to which the contents of short-term memory are consciously accessible is a fundamental question of cognitive science. In audition, short-term memory is often studied via the mismatch negativity (MMN), a change-related component of the auditory evoked response that is elicited by violations of otherwise regular stimulus sequences. The prevailing functional view of the MMN is that it operates on preattentive and even preconscious stimulus representations. We directly examined the preconscious notion of the MMN using informational masking and magnetoencephalography. Spectrally isolated and otherwise suprathreshold auditory oddball sequences were occasionally random rendered inaudible by embedding them in random multitone masker "clouds." Despite identical stimulation/task contexts and a clear representation of all stimuli in auditory cortex, MMN was only observed when the preceding regularity (that is, the standard stream) was consciously perceived. The results call into question the preconscious interpretation of MMN and raise the possibility that it might index partial awareness in the absence of overt behavior.


DP8: J Neurosci. 2017 Nov 1;37(44):10645-10655.

A Crucial Test of the Population Separation Model of Auditory Stream Segregation in Macaque Primary Auditory Cortex

Fishman YI, Kim M, Steinschneider M.

An important aspect of auditory scene analysis is auditory stream segregation-the organization of sound sequences into perceptual streams reflecting different sound sources in the environment. Several models have been proposed to account for stream segregation. According to the "population separation" (PS) model, alternating ABAB tone sequences are perceived as a single stream or as two separate streams when "A" and "B" tones activate the same or distinct frequency-tuned neuronal populations in primary auditory cortex (A1), respectively. A crucial test of the PS model is whether it can account for the observation that A and B tones are generally perceived as a single stream when presented synchronously, rather than in an alternating pattern, even if they are widely separated in frequency. Here, we tested the PS model by recording neural responses to alternating (ALT) and synchronous (SYNC) tone sequences in A1 of male macaques. Consistent with predictions of the PS model, a greater effective tonotopic separation of A and B tone responses was observed under ALT than under SYNC conditions, thus paralleling the perceptual organization of the sequences. While other models of stream segregation, such as temporal coherence, are not excluded by the present findings, we conclude that PS is sufficient to account for the perceptual organization of ALT and SYNC sequences and thus remains a viable model of auditory stream segregation.


DP9: Curr Biol. 2017 Mar 6;27(5):743-750.

Frogs Exploit Statistical Regularities in Noisy Acoustic Scenes to Solve Cocktail-Party-like Problems.

Lee N, Ward JL, Vélez A, Micheyl C, Bee MA.

Noise is a ubiquitous source of errors in all forms of communication [1]. Noise-induced errors in speech communication, for example, make it difficult for humans to converse in noisy social settings, a challenge aptly named the cocktail party problem [2]. Many nonhuman animals also communicate acoustically in noisy social groups and thus face biologically analogous problems [3]. However, we know little about how the perceptual systems of receivers are evolutionarily adapted to avoid the costs of noise-induced errors in communication. In this study of Cope's gray treefrog (Hyla chrysoscelis; Hylidae), we investigated whether receivers exploit a potential statistical regularity present in noisy acoustic scenes to reduce errors in signal recognition and discrimination. We developed an anatomical/physiological model of the peripheral auditory system to show that temporal correlation in amplitude fluctuations across the frequency spectrum (comodulation) [4-6] is a feature of the noise generated by large breeding choruses of sexually advertising males. In four psychophysical experiments, we investigated whether females exploit comodulation in background noise to mitigate noise-induced errors in evolutionarily critical mate-choice decisions. Subjects experienced fewer errors in recognizing conspecific calls and in selecting the calls of high-quality mates in the presence of simulated chorus noise that was comodulated. These data show unequivocally, and for the first time, that exploiting statistical regularities present in noisy acoustic scenes is an important biological strategy for solving cocktail-party-like problems in nonhuman animal communication.


DP10: Front Neurosci. 2016 Nov 15;10:524.

Computational Models of Auditory Scene Analysis: A Review

Szabó BT, Denham SL, Winkler I.

Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.


DP11: J Acoust Soc Am. 2017 Mar;141(3):1985.

Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners

Monaghan JJ, Goehring T, Yang X, Bolner F, Wang S, Wright MC, Bleeck S.

Machine-learning based approaches to speech enhancement have recently shown great promise for improving speech intelligibility for hearing-impaired listeners. Here, the performance of three machine-learning algorithms and one classical algorithm, Wiener filtering, was compared. Two algorithms based on neural networks were examined, one using a previously reported feature set and one using a feature set derived from an auditory model. The third machine-learning approach was a dictionary-based sparse-coding algorithm. Speech intelligibility and quality scores were obtained for participants with mild-to-moderate hearing impairments listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithms. Intelligibility and quality scores were significantly improved by each of the three machine-learning approaches, but not by the classical approach. The largest improvements for both speech intelligibility and quality were found by implementing a neural network using the feature set based on auditory modeling. Furthermore, neural network based techniques appeared more promising than dictionary-based, sparse coding in terms of performance and ease of implementation.


DP12: Curr Biol. 2017 Nov 6;27(21):3237-3247.e6.

Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise

Whitton JP, Hancock KE, Shannon JM, Polley DB.

Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges.


DP13: Curr Biol. 2017 Feb 6;27(3):359-370.

Integer Ratio Priors on Musical Rhythm Revealed Cross-culturally by Iterated Reproduction.

Jacoby N, McDermott JH.

Probability distributions over external states (priors) are essential to the interpretation of sensory signals. Priors for cultural artifacts such as music and language remain largely uncharacterized, but likely constrain cultural transmission, because only those signals with high probability under the prior can be reliably reproduced and communicated. We developed a method to estimate priors for simple rhythms via iterated reproduction of random temporal sequences. Listeners were asked to reproduce random "seed" rhythms; their reproductions were fed back as the stimulus and over time became dominated by internal biases, such that the prior could be estimated by applying the procedure multiple times. We validated that the measured prior was consistent across the modality of reproduction and that it correctly predicted perceptual discrimination. We then measured listeners' priors over the entire space of two- and three-interval rhythms. Priors in US participants showed peaks at rhythms with simple integer ratios and were similar for musicians and non-musicians. An analogous procedure produced qualitatively different results for spoken phrases, indicating some specificity to music. Priors measured in members of a native Amazonian society were distinct from those in US participants but also featured integer ratio peaks. The results do not preclude biological constraints favoring integer ratios, but they suggest that priors on musical rhythm are substantially modulated by experience and may simply reflect the empirical distribution of rhythm that listeners encounter. The proposed method can efficiently map out a high-resolution view of biases that shape transmission and stability of simple reproducible patterns within a culture.


DP14: Front Neurosci. 2016 Nov 24;10:490.

Long Term Memory for Noise: Evidence of Robust Encoding of Very Short Temporal Acoustic Patterns.

Viswanathan J, Rémy F, Bacon-Macé N, Thorpe SJ.

Recent research has demonstrated that humans are able to implicitly encode and retain repeating patterns in meaningless auditory noise. Our study aimed at testing the robustness of long-term implicit recognition memory for these learned patterns. Participants performed a cyclic/non-cyclic discrimination task, during which they were presented with either 1-s cyclic noises (CNs) (the two halves of the noise were identical) or 1-s plain random noises (Ns). Among CNs and Ns presented once, target CNs were implicitly presented multiple times within a block, and implicit recognition of these target CNs was tested 4 weeks later using a similar cyclic/non-cyclic discrimination task. Furthermore, robustness of implicit recognition memory was tested by presenting participants with looped (shifting the origin) and scrambled (chopping sounds into 10- and 20-ms bits before shuffling) versions of the target CNs. We found that participants had robust implicit recognition memory for learned noise patterns after 4 weeks, right from the first presentation. Additionally, this memory was remarkably resistant to acoustic transformations, such as looping and scrambling of the sounds. Finally, implicit recognition of sounds was dependent on participant's discrimination performance during learning. Our findings suggest that meaningless temporal features as short as 10 ms can be implicitly stored in long-term auditory memory. Moreover, successful encoding and storage of such fine features may vary between participants, possibly depending on individual attention and auditory discrimination abilities. Significance Statement Meaningless auditory patterns could be implicitly encoded and stored in long-term memory.Acoustic transformations of learned meaningless patterns could be implicitly recognized after 4 weeks.Implicit long-term memories can be formed for meaningless auditory features as short as 10 ms.Successful encoding and long-term implicit recognition of meaningless patterns may strongly depend on individual attention and auditory discrimination abilities.


MC1: Neuron. 2013 Mar 6;77(5):980-91.

Mechanisms underlying selective neuronal tracking of attended speech at a "cocktail party".

Zion Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, McKhann GM, Goodman RR, Emerson R, Mehta AD, Simon JZ, Poeppel D, Schroeder CE.

The ability to focus on and understand one talker in a noisy social environment is a critical social-cognitive capacity, whose underlying neuronal mechanisms are unclear. We investigated the manner in which speech streams are represented in brain activity and the way that selective attention governs the brain's representation of speech using a Cocktail Party paradigm, coupled with direct recordings from the cortical surface in surgical epilepsy patients. We find that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two. In and near low-level auditory cortices, attention modulates the representation by enhancing cortical tracking of attended speech streams, but ignored speech remains represented. In higher-order regions, the representation appears to become more selective, in that there is no detectable tracking of ignored speech. This selectivity itself seems to sharpen as a sentence unfolds.


MC2: Elife. 2016 Mar 7;5:e11476.

Neural signatures of perceptual inference.

Sedley W, Gander PE, Kumar S, Kovach CK, Oya H, Kawasaki H, Howard MA, Griffiths TD.

Generative models, such as predictive coding, posit that perception results from a combination of sensory input and prior prediction, each weighted by its precision (inverse variance), with incongruence between these termed prediction error (deviation from prediction) or surprise (negative log probability of the sensory input). However, direct evidence for such a system, and the physiological basis of its computations, is lacking. Using an auditory stimulus whose pitch value changed according to specific rules, we controlled and separated the three key computational variables underlying perception, and discovered, using direct recordings from human auditory cortex, that surprise due to prediction violations is encoded by local field potential oscillations in the gamma band (>30 Hz), changes to predictions in the beta band (12-30 Hz), and that the precision of predictions appears to quantitatively relate to alpha band oscillations (8-12 Hz). These results confirm oscillatory codes for critical aspects of generative models of perception.


MC3: J Neurosci. 2016 Sep 21;36(38):9888-95.

Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.

Crosse MJ, Di Liberto GM, Lalor EC.

Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise.


MC4: J Neurosci. 2016 Sep 14;36(37):9572-9

Transitional Probabilities Are Prioritized over Stimulus/Pattern Probabilities in Auditory Deviance Detection: Memory Basis for Predictive Sound Processing.

Mittag M, Takegata R, Winkler I.

Representations encoding the probabilities of auditory events do not directly support predictive processing. In contrast, information about the probability with which a given sound follows another (transitional probability) allows predictions of upcoming sounds. We tested whether behavioral and cortical auditory deviance detection (the latter indexed by the mismatch negativity event-related potential) relies on probabilities of sound patterns or on transitional probabilities. We presented healthy adult volunteers with three types of rare tone-triplets among frequent standard triplets of high-low-high (H-L-H) or L-H-L pitch structure: proximity deviant (H-H-H/L-L-L), reversal deviant (L-H-L/H-L-H), and first-tone deviant (L-L-H/H-H-L). If deviance detection was based on pattern probability, reversal and first-tone deviants should be detected with similar latency because both differ from the standard at the first pattern position. If deviance detection was based on transitional probabilities, then reversal deviants should be the most difficult to detect because, unlike the other two deviants, they contain no low-probability pitch transitions. The data clearly showed that both behavioral and cortical auditory deviance detection uses transitional probabilities. Thus, the memory traces underlying cortical deviance detection may provide a link between stimulus probability-based change/novelty detectors operating at lower levels of the auditory system and higher auditory cognitive functions that involve predictive processing.


MC5: Nat Neurosci. 2016 Jan;19(1):158-64.

Cortical tracking of hierarchical linguistic structures in connected speech.

Ding N, Melloni L, Zhang H, Tian X, Poeppel D

The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.


YB1: Neuron. 2012 Oct 18;76(2):435-49.

Discrete neocortical dynamics predict behavioral categorization of sounds.

Bathellier B, Ushakova L, Rumpel S.

The ability to group stimuli into perceptual categories is essential for efficient interaction with the environment. Discrete dynamics that emerge in brain networks are believed to be the neuronal correlate of category formation. Observations of such dynamics have recently been made; however, it is still unresolved if they actually match perceptual categories. Using in vivo two-photon calcium imaging in the auditory cortex of mice, we show that local network activity evoked by sounds is constrained to few response modes. Transitions between response modes are characterized by an abrupt switch, indicating attractor-like, discrete dynamics. Moreover, we show that local cortical responses quantitatively predict discrimination performance and spontaneous categorization of sounds in behaving mice. Our results therefore demonstrate that local nonlinear dynamics in the auditory cortex generate spontaneous sound categories which can be selected for behavioral or perceptual decisions.


YB2: Neuron. 2016 Apr 6;90(1):191-203.

Unmasking Latent Inhibitory Connections in Human Cortex to Reveal Dormant Cortical Memories.

Barron HC, Vogels TP, Emir UE, Makin TR, O'Shea J, Clare S, Jbabdi S, Dolan RJ, Behrens TE.

Balance of cortical excitation and inhibition (EI) is thought to be disrupted in several neuropsychiatric conditions, yet it is not clear how it is maintained in the healthy human brain. When EI balance is disturbed during learning and memory in animal models, it can be restabilized via formation of inhibitory replicas of newly formed excitatory connections. Here we assess evidence for such selective inhibitory rebalancing in humans. Using fMRI repetition suppression we measure newly formed cortical associations in the human brain. We show that expression of these associations reduces over time despite persistence in behavior, consistent with inhibitory rebalancing. To test this, we modulated excitation/inhibition balance with transcranial direct current stimulation (tDCS). Using ultra-high-field (7T) MRI and spectroscopy, we show that reducing GABA allows cortical associations to be re-expressed. This suggests that in humans associative memories are stored in balanced excitatory-inhibitory ensembles that lie dormant unless latent inhibitory connections are unmasked.


YB3: Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):9972-9977.

Top-down modulation of sensory cortex gates perceptual learning.

Caras ML, Sanes DH.

Practice sharpens our perceptual judgments, a process known as perceptual learning. Although several brain regions and neural mechanisms have been proposed to support perceptual learning, formal tests of causality are lacking. Furthermore, the temporal relationship between neural and behavioral plasticity remains uncertain. To address these issues, we recorded the activity of auditory cortical neurons as gerbils trained on a sound detection task. Training led to improvements in cortical and behavioral sensitivity that were closely matched in terms of magnitude and time course. Surprisingly, the degree of neural improvement was behaviorally gated. During task performance, cortical improvements were large and predicted behavioral outcomes. In contrast, during nontask listening sessions, cortical improvements were weak and uncorrelated with perceptual performance. Targeted reduction of auditory cortical activity during training diminished perceptual learning while leaving psychometric performance largely unaffected. Collectively, our findings suggest that training facilitates perceptual learning by strengthening both bottom-up sensory encoding and top-down modulation of auditory cortex.


YB4: J Neurosci. 2006 May 3;26(18):4970-82.

Perceptual learning directs auditory cortical map reorganization through top-down influences.

Polley DB, Steinberg EE, Merzenich MM.

The primary sensory cortex is positioned at a confluence of bottom-up dedicated sensory inputs and top-down inputs related to higher-order sensory features, attentional state, and behavioral reinforcement. We tested whether topographic map plasticity in the adult primary auditory cortex and a secondary auditory area, the suprarhinal auditory field, was controlled by the statistics of bottom-up sensory inputs or by top-down task-dependent influences. Rats were trained to attend to independent parameters, either frequency or intensity, within an identical set of auditory stimuli, allowing us to vary task demands while holding the bottom-up sensory inputs constant. We observed a clear double-dissociation in map plasticity in both cortical fields. Rats trained to attend to frequency cues exhibited an expanded representation of the target frequency range within the tonotopic map but no change in sound intensity encoding compared with controls. Rats trained to attend to intensity cues expressed an increased proportion of nonmonotonic intensity response profiles preferentially tuned to the target intensity range but no change in tonotopic map organization relative to controls. The degree of topographic map plasticity within the task-relevant stimulus dimension was correlated with the degree of perceptual learning for rats in both tasks. These data suggest that enduring receptive field plasticity in the adult auditory cortex may be shaped by task-specific top-down inputs that interact with bottom-up sensory inputs and reinforcement-based neuromodulator release. Top-down inputs might confer the selectivity necessary to modify a single feature representation without affecting other spatially organized feature representations embedded within the same neural circuitry.


YB5: Nature. 2011 Dec 7;480(7377):331-5

A disinhibitory microcircuit for associative fear learning in the auditory cortex.

Letzkus JJ, Wolff SB, Meyer EM, Tovote P, Courtin J, Herry C, Lüthi A.

Learning causes a change in how information is processed by neuronal circuits. Whereas synaptic plasticity, an important cellular mechanism, has been studied in great detail, we know much less about how learning is implemented at the level of neuronal circuits and, in particular, how interactions between distinct types of neurons within local networks contribute to the process of learning. Here we show that acquisition of associative fear memories depends on the recruitment of a disinhibitory microcircuit in the mouse auditory cortex. Fear-conditioning-associated disinhibition in auditory cortex is driven by foot-shock-mediated cholinergic activation of layer 1 interneurons, in turn generating inhibition of layer 2/3 parvalbumin-positive interneurons. Importantly, pharmacological or optogenetic block of pyramidal neuron disinhibition abolishes fear learning. Together, these data demonstrate that stimulus convergence in the auditory cortex is necessary for associative fear learning to complex tones, define the circuit elements mediating this convergence and suggest that layer-1-mediated disinhibition is an important mechanism underlying learning and information processing in neocortical circuits.


YB6: Nat Neurosci. 2011 Jan;14(1):108-14

Auditory cortex spatial sensitivity sharpens during task performance.

Lee CC, Middlebrooks JC.

Activity in the primary auditory cortex (A1) is essential for normal sound localization behavior, but previous studies of the spatial sensitivity of neurons in A1 have found broad spatial tuning. We tested the hypothesis that spatial tuning sharpens when an animal engages in an auditory task. Cats performed a task that required evaluation of the locations of sounds and one that required active listening, but in which sound location was irrelevant. Some 26-44% of the units recorded in A1 showed substantially sharpened spatial tuning during the behavioral tasks as compared with idle conditions, with the greatest sharpening occurring during the location-relevant task. Spatial sharpening occurred on a scale of tens of seconds and could be replicated multiple times in ∼1.5-h test sessions. Sharpening resulted primarily from increased suppression of responses to sounds at least-preferred locations. That and an observed increase in latencies suggest an important role of inhibitory mechanisms.


YB7: Elife. 2016 Mar 4;5. pii: e12577.

The auditory representation of speech sounds in human motor cortex.

Cheung C, Hamiton LS, Johnson K, Chang EF.

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


YB8: Science. 2014 Feb 28;343(6174):1006-10.

Phonetic feature encoding in human superior temporal gyrus.

Mesgarani N, Cheung C, Johnson K, Chang EF.

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.


YB9: Neuron. 2014 Jun 4;82(5):1157-70.

Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents.

Rodgers CC, DeWeese MR.

Animals can selectively respond to a target sound despite simultaneous distractors, just as humans can respond to one voice at a crowded cocktail party. To investigate the underlying neural mechanisms, we recorded single-unit activity in primary auditory cortex (A1) and medial prefrontal cortex (mPFC) of rats selectively responding to a target sound from a mixture. We found that prestimulus activity in mPFC encoded the selection rule-which sound from the mixture the rat should select. Moreover, electrically disrupting mPFC significantly impaired performance. Surprisingly, prestimulus activity in A1 also encoded selection rule, a cognitive variable typically considered the domain of prefrontal regions. Prestimulus changes correlated with stimulus-evoked changes, but stimulus tuning was not strongly affected. We suggest a model in which anticipatory activation of a specific network of neurons underlies the selection of a sound from a mixture, giving rise to robust and widespread rule encoding in both brain regions.


YB10: J Neurosci. 2011 Aug 17;31(33):11867-78.

Extra-classical tuning predicts stimulus-dependent receptive fields in auditory neurons.

Schneider DM, Woolley SM.

The receptive fields of many sensory neurons are sensitive to statistical differences among classes of complex stimuli. For example, excitatory spectral bandwidths of midbrain auditory neurons and the spatial extent of cortical visual neurons differ during the processing of natural stimuli compared to the processing of artificial stimuli. Experimentally characterizing neuronal nonlinearities that contribute to stimulus-dependent receptive fields is important for understanding how neurons respond to different stimulus classes in multiple sensory modalities. Here we show that in the zebra finch, many auditory midbrain neurons have extra-classical receptive fields, consisting of sideband excitation and sideband inhibition. We also show that the presence, degree, and asymmetry of stimulus-dependent receptive fields during the processing of complex sounds are predicted by the presence, valence, and asymmetry of extra-classical tuning. Neurons for which excitatory bandwidth expands during the processing of song have extra-classical excitation. Neurons for which frequency tuning is static and for which excitatory bandwidth contracts during the processing of song have extra-classical inhibition. Simulation experiments further demonstrate that stimulus-dependent receptive fields can arise from extra-classical tuning with a static spike threshold nonlinearity. These findings demonstrate that a common neuronal nonlinearity can account for the stimulus dependence of receptive fields estimated from the responses of auditory neurons to stimuli with natural and non-natural statistics.