[CogSci307 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents (indiqués par leurs initiales). Les articles de BPC ne peuvent être choisis qu'à l'écrit.


AdC1: Hear Res. 2021 May;404:108213.

The perception of octave pitch affinity and harmonic fusion have a common origin

Laurent Demany, Guilherme Monteiro, Catherine Semal, Shihab Shamma, Robert P Carlyon

Musicians say that the pitches of tones with a frequency ratio of 2:1 (one octave) have a distinctive affinity, even if the tones do not have common spectral components. It has been suggested, however, that this affinity judgment has no biological basis and originates instead from an acculturation process ‒ the learning of musical rules unrelated to auditory physiology. We measured, in young amateur musicians, the perceptual detectability of octave mistunings for tones presented alternately (melodic condition) or simultaneously (harmonic condition). In the melodic condition, mistuning was detectable only by means of explicit pitch comparisons. In the harmonic condition, listeners could use a different and more efficient perceptual cue: in the absence of mistuning, the tones fused into a single sound percept; mistunings decreased fusion. Performance was globally better in the harmonic condition, in line with the hypothesis that listeners used a fusion cue in this condition; this hypothesis was also supported by results showing that an illusory simultaneity of the tones was much less advantageous than a real simultaneity. In the two conditions, mistuning detection was generally better for octave compressions than for octave stretchings. This asymmetry varied across listeners, but crucially the listener-specific asymmetries observed in the two conditions were highly correlated. Thus, the perception of the melodic octave appeared to be closely linked to the phenomenon of harmonic fusion. As harmonic fusion is thought to be determined by biological factors rather than factors related to musical culture or training, we argue that octave pitch affinity also has, at least in part, a biological basis.


AdC2: Trends Neurosci. 2020 Feb;43(2):88-102.

Diverse Mechanisms of Sound Frequency Discrimination in the Vertebrate Cochlea

Robert Fettiplace

Discrimination of different sound frequencies is pivotal to recognizing and localizing friend and foe. Here, I review the various hair cell-tuning mechanisms used among vertebrates. Electrical resonance, filtering of the receptor potential by voltage-dependent ion channels, is ubiquitous in all non-mammals, but has an upper limit of ~1 kHz. The frequency range is extended by mechanical resonance of the hair bundles in frogs and lizards, but may need active hair-bundle motion to achieve sharp tuning up to 5 kHz. Tuning in mammals uses somatic motility of outer hair cells, underpinned by the membrane protein prestin, to expand the frequency range. The bird cochlea may also use prestin at high frequencies, but hair cells <1 kHz show electrical resonance.


AdC3: J Acoust Soc Am. 2021 Apr;149(4):2644

On musical interval perception for complex tones at very high frequencies

Hedwig E Gockel, Robert P Carlyon

Listeners appear able to extract a residue pitch from high-frequency harmonics for which phase locking to the temporal fine structure is weak or absent. The present study investigated musical interval perception for high-frequency harmonic complex tones using the same stimuli as Lau, Mehta, and Oxenham [J. Neurosci. 37, 9013-9021 (2017)]. Nine young musically trained listeners with especially good high-frequency hearing adjusted various musical intervals using harmonic complex tones containing harmonics 6-10. The reference notes had fundamental frequencies (F0s) of 280 or 1400 Hz. Interval matches were possible, albeit markedly worse, even when all harmonic frequencies were above the presumed limit of phase locking. Matches showed significantly larger systematic errors and higher variability, and subjects required more trials to finish a match for the high than for the low F0. Additional absolute pitch judgments from one subject with absolute pitch, for complex tones containing harmonics 1-5 or 6-10 with a wide range of F0s, were perfect when the lowest frequency component was below about 7 kHz, but at least 50% of responses were incorrect when it was 8 kHz or higher. The results are discussed in terms of the possible effects of phase-locking information and familiarity with high-frequency stimuli on pitch.


AdC4: Sci Rep. 2019 Jul 18;9(1):10404.

Speech perception is similar for musicians and non-musicians across a wide range of conditions

Sara M K Madsen, Marton Marschall, Torsten Dau, Andrew J Oxenham

It remains unclear whether musical training is associated with improved speech understanding in a noisy environment, with different studies reaching differing conclusions. Even in those studies that have reported an advantage for highly trained musicians, it is not known whether the benefits measured in laboratory tests extend to more ecologically valid situations. This study aimed to establish whether musicians are better than non-musicians at understanding speech in a background of competing speakers or speech-shaped noise under more realistic conditions, involving sounds presented in space via a spherical array of 64 loudspeakers, rather than over headphones, with and without simulated room reverberation. The study also included experiments testing fundamental frequency discrimination limens (F0DLs), interaural time differences limens (ITDLs), and attentive tracking. Sixty-four participants (32 non-musicians and 32 musicians) were tested, with the two groups matched in age, sex, and IQ as assessed with Raven's Advanced Progressive matrices. There was a significant benefit of musicianship for F0DLs, ITDLs, and attentive tracking. However, speech scores were not significantly different between the two groups. The results suggest no musician advantage for understanding speech in background noise or talkers under a variety of conditions.


AdC5: J Acoust Soc Am. 2021 Jan;149(1):259.

Gradual decay and sudden death of short-term memory for pitch

Samuel R Mathias, Leonard Varghese, Christophe Micheyl, Barbara G Shinn-Cunningham

The ability to discriminate frequency differences between pure tones declines as the duration of the interstimulus interval (ISI) increases. The conventional explanation for this finding is that pitch representations gradually decay from auditory short-term memory. Gradual decay means that internal noise increases with increasing ISI duration. Another possibility is that pitch representations experience 'sudden death,' disappearing without a trace from memory. Sudden death means that listeners guess (respond at random) more often when the ISIs are longer. Since internal noise and guessing probabilities influence the shape of psychometric functions in different ways, they can be estimated simultaneously. Eleven amateur musicians performed a two-interval, two-alternative forced-choice frequency-discrimination task. The frequencies of the first tones were roved, and frequency differences and ISI durations were manipulated across trials. Data were analyzed using Bayesian models that simultaneously estimated internal noise and guessing probabilities. On average across listeners, internal noise increased monotonically as a function of increasing ISI duration, suggesting that gradual decay occurred. The guessing rate decreased with an increasing ISI duration between 0.5 and 2 s but then increased with further increases in ISI duration, suggesting that sudden death occurred but perhaps only at longer ISIs. Results are problematic for decay-only models of discrimination and contrast with those from a study on visual short-term memory, which found that over similar durations, visual representations experienced little gradual decay yet substantial sudden death.


AdC6: Nat Hum Behav. 2022 Mar;6(3):455-469.

Multiscale temporal integration organizes hierarchical computation in human auditory cortex

Norman-Haignere SV, Long LK, Devinsky O, Doyle W, Irobunda I, Merricks EM, Feldstein NA, McKhann GM, Schevon CA, Flinker A, Mesgarani N.

To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.


AdC7: J Assoc Res Otolaryngol. 2022 Feb;23(1):17-25

Whistling While it Works: Spontaneous Otoacoustic Emissions and the Cochlear Amplifier

J Assoc Res Otolaryngol. 2022 Feb;23(1):17-25.

Perhaps the most striking evidence for active processes operating within the inner ears of mammals and non-mammals alike is their ability to spontaneously produce sound. Predicted by Thomas Gold in 1948, some 30 years prior to their discovery, the narrow-band sounds now known as spontaneous otoacoustic emissions (SOAEs) remain incompletely understood, their origins controversial. Without a single equation in the main text, we review the essential concepts underlying the local- and global-oscillator frameworks for understanding SOAE generation. Comparing their key assumptions and predictions, we relate the two frameworks to unresolved questions about the biophysical mechanisms of cochlear amplification.


BPC1: PLoS One. 2019 May 16;14(5):e0216874.

Music training with Démos program positively influences cognitive functions in children from low socio-economic backgrounds.

Barbaroux M, Dittinger E, Besson M

This study aimed at evaluating the impact of a classic music training program (Démos) on several aspects of the cognitive development of children from low socio-economic backgrounds. We were specifically interested in general intelligence, phonological awareness and reading abilities, and in other cognitive abilities that may be improved by music training such as auditory and visual attention, working and short-term memory and visuomotor precision. We used a longitudinal approach with children presented with standardized tests before the start and after 18 months of music training. To test for pre-to-post training improvements while discarding maturation and developmental effects, raw scores for each child and for each test were normalized relative to their age group. Results showed that Démos music training improved musicality scores, total IQ and Symbol Search scores as well as concentration abilities and reading precision. In line with previous results, these findings demonstrate the positive impact of an ecologically-valid music training program on the cognitive development of children from low socio-economic backgrounds and strongly encourage the broader implementation of such programs in disadvantaged school-settings.


BPC2: Sci Rep. 2020 Jul 8;10(1):11222.

Music, Language, and The N400: ERP Interference Patterns Across Cognitive Domains

Nicole Calma-Roddin, John E Drury

Studies of the relationship of language and music have suggested these two systems may share processing resources involved in the computation/maintenance of abstract hierarchical structure (syntax). One type of evidence comes from ERP interference studies involving concurrent language/music processing showing interaction effects when both processing streams are simultaneously perturbed by violations (e.g., syntactically incorrect words paired with incongruent completion of a chord progression). Here, we employ this interference methodology to target the mechanisms supporting long term memory (LTM) access/retrieval in language and music. We used melody stimuli from previous work showing out-of-key or unexpected notes may elicit a musical analogue of language N400 effects, but only for familiar melodies, and not for unfamiliar ones. Target notes in these melodies were time-locked to visually presented target words in sentence contexts manipulating lexical/conceptual semantic congruity. Our study succeeded in eliciting expected N400 responses from each cognitive domain independently. Among several new findings we argue to be of interest, these data demonstrate that: (i) language N400 effects are delayed in onset by concurrent music processing only when melodies are familiar, and (ii) double violations with familiar melodies (but not with unfamiliar ones) yield a sub-additive N400 response. In addition: (iii) early negativities (RAN effects), which previous work has connected to musical syntax, along with the music N400, were together delayed in onset for familiar melodies relative to the timing of these effects reported in the previous music-only study using these same stimuli, and (iv) double violation cases involving unfamiliar/novel melodies also delayed the RAN effect onset. These patterns constitute the first demonstration of N400 interference effects across these domains and together contribute previously undocumented types of interactions to the available pool of findings relevant to understanding whether language and music may rely on shared underlying mechanisms.


BPC3: Front Neurosci. 2019 Mar 13;13:142.

Electrical Neuroimaging of Music Processing in Pianists With and Without True Absolute Pitch

Coll SY, Vuichoud N, Grandjean D, James CE.

True absolute pitch (AP), labeling of pitches with semitone precision without a reference, is classically studied using isolated tones. However, AP is acquired and has its function within complex dynamic musical contexts. Here we examined event-related brain responses and underlying cerebral sources to endings of short expressive string quartets, investigating a homogeneous population of young highly trained pianists with half of them possessing true-AP. The pieces ended regularly or contained harmonic transgressions at closure that participants appraised. Given the millisecond precision of ERP analyses, this experimental plan allowed examining whether AP alters music processing at an early perceptual, or later cognitive level, or both, and which cerebral sources underlie differences with non-AP musicians. We also investigated the impact of AP on general auditory cognition. Remarkably, harmonic transgression sensitivity did not differ between AP and non-AP participants, and differences for auditory cognition were only marginal. The key finding of this study is the involvement of a microstate peaking around 60 ms after musical closure, characterizing AP participants. Concurring sources were estimated in secondary auditory areas, comprising the planum temporale, all transgression conditions collapsed. These results suggest that AP is not a panacea to become a proficient musician, but a rare perceptual feature.


BPC4: Cortex. 2019 Apr;113:229-238.

The co-occurrence of pitch and rhythm disorders in congenital amusia.

Lagrois ME, Peretz I.

The most studied form of congenital amusia is characterized by a difficulty with detecting pitch anomalies in melodies, also referred to as pitch deafness. Here, we tested for the presence of associated deficits in rhythm processing, beat in particular, in pitch deafness. In Experiment 1, participants performed beat perception and production tasks with musical excerpts of various genres. The results show a beat finding disorder in six of the ten assessed pitch-deaf participants. In order to remove a putative interference of pitch variations with beat extraction, the same participants were tested with percussive rhythms in Experiment 2 and showed a similar impairment. Furthermore, musical pitch and beat processing abilities were correlated. These new results highlight the tight connection between melody and rhythm in music processing that can nevertheless dissociate in some individuals.


BPC5: Brain Cogn. 2022 Aug;161:105881.

Tonal structures benefit short-term memory for real music: Evidence from non-musicians and individuals with congenital amusia

Lévêque Y, Lalitte P, Fornoni L, Pralus A, Albouy P, Bouchet P, Caclin A, Tillmann B.

Congenital amusia is a neurodevelopmental disorder of music processing, which includes impaired pitch memory, associated to abnormalities in the right fronto-temporal network. Previous research has shown that tonal structures (as defined by the Western musical system) improve short-term memory performance for short tone sequences (in comparison to atonal versions) in non-musician listeners, but the tonal structures only benefited response times in amusic individuals. We here tested the potential benefit of tonal structures for short-term memory with more complex musical material. Congenital amusics and their matched non-musician controls were required to indicate whether two excerpts were the same or different. Results confirmed impaired performance of amusic individuals in this short-term memory task. However, most importantly, both groups of participants showed better memory performance for tonal material than for atonal material. These results revealed that even amusics' impaired short-term memory for pitch shows classical characteristics of short-term memory, that is the mnemonic benefit of structure in the to-be-memorized material. The findings show that amusic individuals have acquired some implicit knowledge of regularities of their culture, allowing for implicit processing of tonal structures, which benefits to memory even for complex material.


BPC6: Psychomusicology: Music, Mind, and Brain. 2018 Vol. 28, No. 3, 178–188

A Cross-Cultural Comparison of Tonality Perception in Japanese, Chinese, Vietnamese, Indonesian, and American Listeners

Rie Matsunaga, Toshinori Yasuda, Michelle Johnson-Motoyama, Pitoyo Hartono, Koichi Yokosawa and Jun-ichi Abe

We investigated tonal perception of melodies from 2 cultures (Western and traditional Japanese) by 5 different cultural groups (44 Japanese, 25 Chinese, 16 Vietnamese, 18 Indonesians, and 25 U.S. citizens). Listeners rated the degree of melodic completeness of the final tone (a tonic vs. a nontonic) and happiness–sadness in the mode (major vs. minor, YOH vs. IN) of each melody. When Western melodies were presented, American and Japanese listeners responded similarly, such that they reflected implicit tonal knowledge of Western music. By contrast, the responses of Chinese, Vietnamese, and Indonesian listeners were different from those of American and Japanese listeners. When traditional Japanese melodies were presented, Japanese listeners exhibited responses that reflected implicit tonal knowledge of traditional Japanese music. American listeners also showed responses that were like the Japanese; however, the pattern of responses differed between the 2 groups. Alternatively, Chinese, Vietnamese, and Indonesian listeners exhibited different responses from the Japanese. These results show large differences between the Chinese/Vietnamese/Indonesian group and the American/Japanese group. Furthermore, the differences in responses to Western melodies between Americans and Japanese were less pronounced than that between Chinese, Vietnamese, and Indonesians. These findings imply that cultural differences in tonal perception are more diverse and distinctive than previously believed.


BPC7: Behav Brain Res. 2020 Jul 15;390:112662.

Musicians use speech-specific areas when processing tones: The key to their superior linguistic competence?

Mariacristina Musso, Hannah Fürniss, Volkmar Glauche, Horst Urbach, Cornelius Weiller, Michel Rijntjes

It is known that musicians compared to non-musicians have some superior speech and language competence, yet the mechanisms how musical training leads to this advantage are not well specified. This event-related fMRI study confirmed that musicians outperformed non-musicians in processing not only of musical tones but also syllables and identified a network differentiating musicians from non-musicians during processing of linguistic sounds. Within this network, the activation of bilateral superior temporal gyrus was shared with all subjects during processing of the acoustically well-matched musical and linguistic sounds, and with the activation distinguishing tones with a complex harmonic spectrum (bowed tone) from a simpler one (plucked tone). These results confirm that better speech processing in musicians relies on improved cross-domain spectral analysis. Activation of left posterior superior temporal sulcus (pSTS), premotor cortex, inferior frontal and fusiform gyrus (FG) also distinguishing musicians from non-musicians during syllable processing overlapped with the activation segregating linguistic from musical sounds in all subjects. Since these brain-regions were not involved during tone processing in non-musicians, they could code for functions which are specialized for speech. Musicians recruited pSTS and FG during tone processing, thus these speech-specialized brain-areas processed musical sounds in the presence of musical training. This study shows that the linguistic advantage of musicians is linked not only to improved cross-domain spectral analysis, but also to the functional adaptation of brain resources that are specialized for speech, but accessible to the domain of music in the presence of musical training.


BPC8: Neuropsychologia. 2018 Aug;117:67-74.

Seeing music: The perception of melodic 'ups and downs' modulates the spatial processing of visual stimuli.

Romero-Rivas C, Vera-Constán F, Rodríguez-Cuadrado S, Puigcerver L, Fernández-Prieto I, Navarra J.

Musical melodies have peaks and valleys. Although the vertical component of pitch and music is well-known, the mechanisms underlying its mental representation still remain elusive. We show evidence regarding the importance of previous experience with melodies for crossmodal interactions to emerge. The impact of these crossmodal interactions on other perceptual and attentional processes was also studied. Melodies including two tones with different frequency (e.g., E4 and D3) were repeatedly presented during the study. These melodies could either generate strong predictions (e.g., E4-D3-E4-D3-E4-[D3]) or not (e.g., E4-D3-E4-E4-D3-[?]). After the presentation of each melody, the participants had to judge the colour of a visual stimulus that appeared in a position that was, according to the traditional vertical connotations of pitch, either congruent (e.g., high-low-high-low-[up]), incongruent (high-low-high-low-[down]) or unpredicted with respect to the melody. Behavioural and electroencephalographic responses to the visual stimuli were obtained. Congruent visual stimuli elicited faster responses at the end of the experiment than at the beginning. Additionally, incongruent visual stimuli that broke the spatial prediction generated by the melody elicited larger P3b amplitudes (reflecting 'surprise' responses). Our results suggest that the passive (but repeated) exposure to melodies elicits spatial predictions that modulate the processing of other sensory events.


BPC9: Front Psychol. 2019 Sep 11;10:1990.

Implicit Processing of Pitch in Postlingually Deafened Cochlear Implant Users

Tillmann B, Poulin-Charronnat B, Gaudrain E, Akhoun I, Delbé C, Truy E, Collet L.

Cochlear implant (CI) users can only access limited pitch information through their device, which hinders music appreciation. Poor music perception may not only be due to CI technical limitations; lack of training or negative attitudes toward the electric sound might also contribute to it. Our study investigated with an implicit (indirect) investigation method whether poorly transmitted pitch information, presented as musical chords, can activate listeners' knowledge about musical structures acquired prior to deafness. Seven postlingually deafened adult CI users participated in a musical priming paradigm investigating pitch processing without explicit judgments. Sequences made of eight sung-chords that ended on either a musically related (expected) target chord or a less-related (less-expected) target chord were presented. The use of a priming task based on linguistic features allowed CI patients to perform fast judgments on target chords in the sung music. If listeners' musical knowledge is activated and allows for tonal expectations (as in normal-hearing listeners), faster response times were expected for related targets than less-related targets. However, if the pitch percept is too different and does not activate musical knowledge acquired prior to deafness, storing pitch information in a short-term memory buffer predicts the opposite pattern. If transmitted pitch information is too poor, no difference in response times should be observed. Results showed that CI patients were able to perform the linguistic task on the sung chords, but correct response times indicated sensory priming, with faster response times observed for the less-related targets: CI patients processed at least some of the pitch information of the musical sequences, which was stored in an auditory short-term memory and influenced chord processing. This finding suggests that the signal transmitted via electric hearing led to a pitch percept that was too different from that based on acoustic hearing, so that it did not automatically activate listeners' previously acquired musical structure knowledge. However, the transmitted signal seems sufficiently informative to lead to sensory priming. These findings are encouraging for the development of pitch-related training programs for CI patients, despite the current technological limitations of the CI coding.


CL1: Nat Biomed Eng. 2022 Jun;6(6):717-730.

Compression and amplification algorithms in hearing aids impair the selectivity of neural responses to speech

Armstrong AG, Lam CC, Sabesan S, Lesica NA.

In quiet environments, hearing aids improve the perception of low-intensity sounds. However, for high-intensity sounds in background noise, the aids often fail to provide a benefit to the wearer. Here, using large-scale single-neuron recordings from hearing-impaired gerbils-an established animal model of human hearing-we show that hearing aids restore the sensitivity of neural responses to speech, but not their selectivity. Rather than reflecting a deficit in supra-threshold auditory processing, the low selectivity is a consequence of hearing-aid compression (which decreases the spectral and temporal contrasts of incoming sound) and amplification (which distorts neural responses, regardless of whether hearing is impaired). Processing strategies that avoid the trade-off between neural sensitivity and selectivity should improve the performance of hearing aids.


CL2: J Acoust Soc Am. 2012 May;131(5):4030-41.

Across-site patterns of modulation detection: relation to speech recognition.

Garadat SN, Zwolan TA, Pfingst BE.

The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects' speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant.


CL3: J Acoust Soc Am. 2019 Mar;145(3):1493.

Comparison of effects on subjective intelligibility and quality of speech in babble for two algorithms: A deep recurrent neural network and spectral subtraction

Mahmoud Keshavarzi, Tobias Goehring, Richard E Turner, Brian C J Moore

The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.


CL4: Trends Hear. Jan-Dec 2021;25:23312165211014437.

Measuring the Influence of Noise Reduction on Listening Effort in Hearing-Impaired Listeners Using Response Times to an Arithmetic Task in Noise

Ilja Reinten, Inge De Ronde-Brons, Rolph Houben, Wouter Dreschler

Single microphone noise reduction (NR) in hearing aids can provide a subjective benefit even when there is no objective improvement in speech intelligibility. A possible explanation lies in a reduction of listening effort. Previously, we showed that response times (a proxy for listening effort) to an auditory-only dual-task were reduced by NR in normal-hearing (NH) listeners. In this study, we investigate if the results from NH listeners extend to the hearing-impaired (HI), the target group for hearing aids. In addition, we assess the relevance of the outcome measure for studying and understanding listening effort. Twelve HI subjects were asked to sum two digits of a digit triplet in noise. We measured response times to this task, as well as subjective listening effort and speech intelligibility. Stimuli were presented at three signal-to-noise ratios (SNR; -5, 0, +5 dB) and in quiet. Stimuli were processed with ideal or nonideal NR, or unprocessed. The effect of NR on response times in HI listeners was significant only in conditions where speech intelligibility was also affected (-5 dB SNR). This is in contrast to the previous results with NH listeners. There was a significant effect of SNR on response times for HI listeners. The response time measure was reasonably correlated (R142 = 0.54) to subjective listening effort and showed a sufficient test-retest reliability. This study thus presents an objective, valid, and reliable measure for evaluating an aspect of listening effort of HI listeners.


CL5: Neuroscience. 2019 May 21;407:8-20.

Primary Neural Degeneration in the Human Cochlea: Evidence for Hidden Hearing Loss in the Aging Ear

P. Z. Wu, L. D. Liberman, K. Bennett, V. de Gruttola, J. T. O’Malley and M. C. Liberman

The noise-induced and age-related loss of synaptic connections between auditory-nerve fibers and cochlear hair cells is well-established from histopathology in several mammalian species; however, its prevalence in humans, as inferred from electrophysiological measures, remains controversial. Here we look for cochlear neuropathy in a temporal-bone study of ‘‘normal-aging” humans, using autopsy material from 20 subjects aged 0–89 yrs, with no history of otologic disease. Cochleas were immunostained to allow accurate quantification of surviving hair cells in the organ Corti and peripheral axons of auditory-nerve fibers. Mean loss of outer hair cells was 30–40% throughout the audiometric frequency range (0.25–8.0 kHz) in subjects over 60 yrs, with even greater losses at both apical (low-frequency) and basal (high-frequency) ends. In contrast, mean inner hair cell loss across audiometric frequencies was rarely >15%, at any age. Neural loss greatly exceeded inner hair cell loss, with 7/11 subjects over 60 yrs showing >60% loss of peripheral axons re the youngest subjects, and with the age-related slope of axonal loss outstripping the age-related loss of inner hair cells by almost 3:1. The results suggest that a large number of auditory neurons in the aging ear are disconnected from their hair cell tar- gets. This primary neural degeneration would not affect the audiogram, but likely contributes to age-related hear- ing impairment, especially in noisy environments. Thus, therapies designed to regrow peripheral axons could provide clinically meaningful improvement in the aged ear.


DP1: Curr Biol. 2022 Jun 6;32(11):2548-2555

Simultaneous mnemonic and predictive representations in the auditory cortex

Cappotto D, Kang H, Li K, Melloni L, Schnupp J, Auksztulewicz R.

Recent studies have shown that stimulus history can be decoded via the use of broadband sensory impulses to reactivate mnemonic representations.1-4. However, memories of previous stimuli can also be used to form sensory predictions about upcoming stimuli.5,6 Predictive mechanisms allow the brain to create a probable model of the outside world, which can be updated when errors are detected between the model predictions and external inputs. 7-10 Direct recordings in the auditory cortex of awake mice established neural mechanisms for how encoding mechanisms might handle working memory and predictive processes without overwriting recent sensory events in instances where predictive mechanisms are triggered by oddballs within a sequence.11 However, it remains unclear whether mnemonic and predictive information can be decoded from cortical activity simultaneously during passive, implicit sequence processing, even in anesthetized models. Here, we recorded neural activity elicited by repeated stimulus sequences using electrocorticography (ECoG) in the auditory cortex of anesthetized rats, where events within the sequence (referred to henceforth as vowels, for simplicity) were occasionally replaced with a broadband noise burst or omitted entirely. We show that both stimulus history and predicted stimuli can be decoded from neural responses to broadband impulses, at overlapping latencies but based on independent and uncorrelated data features. We also demonstrate that predictive representations are dynamically updated over the course of stimulation.


DP2: Curr Biol. 2021 Oct 11;31(19):4367-4372.e4

Frequency modulation of rattlesnake acoustic display affects acoustic distance perception in humans

Michael Forsthofer, Michael Schutte, Harald Luksch, Tobias Kohl, Lutz Wiegrebe, Boris P Chagnaud

The estimation of one's distance to a potential threat is essential for any animal's survival. Rattlesnakes inform about their presence by generating acoustic broadband rattling sounds.1 Rattlesnakes generate their acoustic signals by clashing a series of keratinous segments onto each other, which are located at the tip of their tails.1-3 Each tail shake results in a broadband sound pulse that merges into a continuous acoustic signal with fast-repeating tail shakes. This acoustic display is readily recognized by other animals4,5 and serves as an aposematic threat and warning display, likely to avoid being preyed upon.1,6 The spectral properties of the rattling sound1,3 and its dependence on the morphology and size of the rattle have been investigated for decades7-9 and carry relevant information for different receivers, including ground squirrels that encounter rattlesnakes regularly.10,11 Combining visual looming stimuli with acoustic measurements, we show that rattlesnakes increase their rattling rate (up to about 40 Hz) with decreasing distance of a potential threat, reminiscent of the acoustic signals of sensors while parking a car. Rattlesnakes then abruptly switch to a higher and less variable rate of 60-100 Hz. In a virtual reality experiment, we show that this behavior systematically affects distance judgments by humans: the abrupt switch in rattling rate generates a sudden, strong percept of decreased distance which, together with the low-frequency rattling, acts as a remarkable interspecies communication signal.


DP3: PLoS Comput Biol. 2022 Mar 3;18(3):e1009889.

Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch

Guest DR, Oxenham AJ.

Accurate pitch perception of harmonic complex tones is widely believed to rely on temporal fine structure information conveyed by the precise phase-locked responses of auditory-nerve fibers. However, accurate pitch perception remains possible even when spectrally resolved harmonics are presented at frequencies beyond the putative limits of neural phase locking, and it is unclear whether residual temporal information, or a coarser rate-place code, underlies this ability. We addressed this question by measuring human pitch discrimination at low and high frequencies for harmonic complex tones, presented either in isolation or in the presence of concurrent complex-tone maskers. We found that concurrent complex-tone maskers impaired performance at both low and high frequencies, although the impairment introduced by adding maskers at high frequencies relative to low frequencies differed between the tested masker types. We then combined simulated auditory-nerve responses to our stimuli with ideal-observer analysis to quantify the extent to which performance was limited by peripheral factors. We found that the worsening of both frequency discrimination and F0 discrimination at high frequencies could be well accounted for (in relative terms) by optimal decoding of all available information at the level of the auditory nerve. A Python package is provided to reproduce these results, and to simulate responses to acoustic stimuli from the three previously published models of the human auditory nerve used in our analyses.


DP4: J Assoc Res Otolaryngol. 2021 Dec;22(6):693-702.

Infant Pitch and Timbre Discrimination in the Presence of Variation in the Other Dimension

Lau BK, Oxenham AJ, Werner LA.

Adult listeners perceive pitch with fine precision, with many adults capable of discriminating less than a 1 % change in fundamental frequency (F0). Although there is variability across individuals, this precise pitch perception is an ability ascribed to cortical functions that are also important for speech and music perception. Infants display neural immaturity in the auditory cortex, suggesting that pitch discrimination may improve throughout infancy. In two experiments, we tested the limits of F0 (pitch) and spectral centroid (timbre) perception in 66 infants and 31 adults. Contrary to expectations, we found that infants at both 3 and 7 months were able to reliably detect small changes in F0 in the presence of random variations in spectral content, and vice versa, to the extent that their performance matched that of adults with musical training and exceeded that of adults without musical training. The results indicate high fidelity of F0 and spectral-envelope coding in infants, implying that fully mature cortical processing is not necessary for accurate discrimination of these features. The surprising difference in performance between infants and musically untrained adults may reflect a developmental trajectory for learning natural statistical covariations between pitch and timbre that improves coding efficiency but results in degraded performance in adults without musical training when expectations for such covariations are violated.


DP5: Nat Commun. 2019 Mar 21;10(1):1302.

Optimal features for auditory categorization

Shi Tong Liu, Pilar Montes-Lourido, Xiaoqin Wang, Srivatsun Sadagopan

Humans and vocal animals use vocalizations to communicate with members of their species. A necessary function of auditory perception is to generalize across the high variability inherent in vocalization production and classify them into behaviorally distinct categories ('words' or 'call types'). Here, we demonstrate that detecting mid-level features in calls achieves production-invariant classification. Starting from randomly chosen marmoset call features, we use a greedy search algorithm to determine the most informative and least redundant features necessary for call classification. High classification performance is achieved using only 10-20 features per call type. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses. This feature-based approach also succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Our results suggest that high-level neural representations of sounds are based on task-dependent features optimized for specific computational goals.


DP6: J Acoust Soc Am. 2016 Oct;140(4):2542.

Measuring time-frequency importance functions of speech with bubble noise

Michael I Mandel, Sarah E Yoho, Eric W Healy

Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of "glimpses" of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.


DP7: Nat Commun. 2020 Jun 3;11(1):2786.

Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals

Malinda J McPherson, Sophia E Dolan, Alex Durango, Tomas Ossandon, Joaquin Valdés, Eduardo A Undurraga, Nori Jacoby, Ricardo A Godoy, Josh H McDermott

Music perception is plausibly constrained by universal perceptual mechanisms adapted to natural sounds. Such constraints could arise from our dependence on harmonic frequency spectra for segregating concurrent sounds, but evidence has been circumstantial. We measured the extent to which concurrent musical notes are misperceived as a single sound, testing Westerners as well as native Amazonians with limited exposure to Western music. Both groups were more likely to mistake note combinations related by simple integer ratios as single sounds ('fusion'). Thus, even with little exposure to Western harmony, acoustic constraints on sound segregation appear to induce perceptual structure on note combinations. However, fusion did not predict aesthetic judgments of intervals in Westerners, or in Amazonians, who were indifferent to consonance/dissonance. The results suggest universal perceptual mechanisms that could help explain cross-cultural regularities in musical systems, but indicate that these mechanisms interact with culture-specific influences to produce musical phenomena such as consonance.


DP8: Proc Natl Acad Sci U S A. 2013 Jun 11;110(24)

Perceptual basis of evolving Western musical styles

Rodriguez Zivic PH, Shifres F, Cecchi GA.

The brain processes temporal statistics to predict future events and to categorize perceptual objects. These statistics, called expectancies, are found in music perception, and they span a variety of different features and time scales. Specifically, there is evidence that music perception involves strong expectancies regarding the distribution of a melodic interval, namely, the distance between two consecutive notes within the context of another. The recent availability of a large Western music dataset, consisting of the historical record condensed as melodic interval counts, has opened new possibilities for data-driven analysis of musical perception. In this context, we present an analytical approach that, based on cognitive theories of music expectation and machine learning techniques, recovers a set of factors that accurately identifies historical trends and stylistic transitions between the Baroque, Classical, Romantic, and Post-Romantic periods. We also offer a plausible musicological and cognitive interpretation of these factors, allowing us to propose them as data-driven principles of melodic expectation.


DP9: Psychon Bull Rev. 2019 Apr;26(2):583-590.

There is music in repetition: Looped segments of speech and nonspeech induce the perception of music in a time-dependent manner

Jess Rowland, Anna Kasdan, David Poeppel

While many techniques are known to music creators, the technique of repetition is one of the most commonly deployed. The mechanism by which repetition is effective as a music-making tool, however, is unknown. Building on the speech-to-song illusion (Deutsch, Henthorn, & Lapidis in Journal of the Acoustical Society of America, 129(4), 2245-2252, 2011), we explore a phenomenon in which perception of musical attributes are elicited from repeated, or 'looped,' auditory material usually perceived as nonmusical such as speech and environmental sounds. We assessed whether this effect holds true for speech stimuli of different lengths; nonspeech sounds (water dripping); and speech signals decomposed into their rhythmic and spectral components. Participants listened to looped stimuli (from 700 to 4,000 ms) and provided continuous as well as discrete perceptual ratings. We show that the regularizing effect of repetition generalizes to nonspeech auditory material and is strongest for shorter clip lengths in the speech and environmental cases. We also find that deconstructed pitch and rhythmic speech components independently elicit a regularizing effect, though the effect across segment duration is different than that for intact speech and environmental sounds. Taken together, these experiments suggest repetition may invoke active internal mechanisms that bias perception toward musical structure.


DP10: Sci Rep. 2021 Nov 2;11(1):21456.

Adaptive auditory brightness perception

Kai Siedenburg, Feline Malin Barg, Henning Schepker

Perception adapts to the properties of prior stimulation, as illustrated by phenomena such as visual color constancy or speech context effects. In the auditory domain, only little is known about adaptive processes when it comes to the attribute of auditory brightness. Here, we report an experiment that tests whether listeners adapt to spectral colorations imposed on naturalistic music and speech excerpts. Our results indicate consistent contrastive adaptation of auditory brightness judgments on a trial-by-trial basis. The pattern of results suggests that these effects tend to grow with an increase in the duration of the adaptor context but level off after around 8 trials of 2 s duration. A simple model of the response criterion yields a correlation of r = .97 with the measured data and corroborates the notion that brightness perception adapts on timescales that fall in the range of auditory short-term memory. Effects turn out to be similar for spectral filtering based on linear spectral filter slopes and filtering based on a measured transfer function from a commercially available hearing device. Overall, our findings demonstrate the adaptivity of auditory brightness perception under realistic acoustical conditions.


DP11: Psychophysiology. 2022 Aug;59(8):e14028

Psychophysiology. 2022 Aug;59(8):e14028

Suzuki Y, Liao HI, Furukawa S.

A dynamic neural network change, accompanied by cognitive shifts such as internal perceptual alternation in bistable stimuli, is reconciled by the discharge of noradrenergic locus coeruleus neurons. Transient pupil dilation as a consequence of the reconciliation with the neural network in bistable perception has been reported to precede the reported perceptual alternation. Here, we found that baseline pupil size, an index of temporal fluctuation of arousal level over a longer range of timescales than that for the transient pupil changes, relates to the frequency of perceptual alternation in auditory bistability. Baseline pupil size was defined as the mean pupil diameter over a period of 1 s prior to the task requirement (i.e., before the observation period for counting the perceptual alternations in Experiment 1 and reporting whether participants experienced the perceptual alternations in Experiment 2). The results showed that the baseline pupil size monotonically increased with an increasing number of perceptual alternations and its occurrence probability. Furthermore, a cross-correlation analysis indicates that baseline pupil size predicted perceptual alternation at least 35 s before the behavioral response and that the overall correspondence between pupil size and perceptual alternation was maintained over a sustained time window of 45 s at minimum. The overall results suggest that variability of baseline pupil size reflects the stochastic dynamics of arousal fluctuation in the brain related to bistable perception.


MC1: Science. 2020 Feb 28;367(6481):1043-1047.

Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody

Philippe Albouy, Lucas Benjamin, Benjamin Morillon, Robert J Zatorre

Does brain asymmetry for speech and music emerge from acoustical cues or from domain-specific neural networks? We selectively filtered temporal or spectral modulations in sung speech stimuli for which verbal and melodic content was crossed and balanced. Perception of speech decreased only with degradation of temporal information, whereas perception of melodies decreased only with spectral degradation. Functional magnetic resonance imaging data showed that the neural decoding of speech and melodies depends on activity patterns in left and right auditory regions, respectively. This asymmetry is supported by specific sensitivity to spectrotemporal modulation rates within each region. Finally, the effects of degradation on perception were paralleled by their effects on neural classification. Our results suggest a match between acoustical properties of communicative signals and neural specializations adapted to that purpose.


MC2: PLoS Biol. 2016 Nov 15;14(11):e1002577

Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

Helen Blank, Matthew H Davis

Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains.


MC3: PLoS Biol. 2020 Oct 22;18(10):e3000883.

Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers

Christian Brodbeck, Alex Jiao, L Elliot Hong, Jonathan Z Simon

Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers' spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.


MC4: J Neurosci. 2021 Nov 10;41(45):9374-9391.

Broadband Dynamics Rather than Frequency-Specific Rhythms Underlie Prediction Error in the Primate Auditory Cortex

Andrés Canales-Johnson, Ana Filipa Teixeira Borges, Misako Komatsu, Naotaka Fujii, Johannes J Fahrenfort, Kai J Miller, Valdas Noreika

Detection of statistical irregularities, measured as a prediction error response, is fundamental to the perceptual monitoring of the environment. We studied whether prediction error response is associated with neural oscillations or asynchronous broadband activity. Electrocorticography was conducted in three male monkeys, who passively listened to the auditory roving oddball stimuli. Local field potentials (LFPs) recorded over the auditory cortex underwent spectral principal component analysis, which decoupled broadband and rhythmic components of the LFP signal. We found that the broadband component captured the prediction error response, whereas none of the rhythmic components were associated with statistical irregularities of sounds. The broadband component displayed more stochastic, asymmetrical multifractal properties than the rhythmic components, which revealed more self-similar dynamics. We thus conclude that the prediction error response is captured by neuronal populations generating asynchronous broadband activity, defined by irregular dynamic states, which, unlike oscillatory rhythms, appear to enable the neural representation of auditory prediction error response.


MC5: Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6755-60.

Hierarchy of prediction errors for auditory events in human temporal and frontal cortex

Stefan Dürschmid, Erik Edwards, Christoph Reichert, Callum Dewar, Hermann Hinrichs, Hans-Jochen Heinze, Heidi E Kirsch, Sarang S Dalal, Leon Y Deouell, Robert T Knight

Predictive coding theories posit that neural networks learn statistical regularities in the environment for comparison with actual outcomes, signaling a prediction error (PE) when sensory deviation occurs. PE studies in audition have capitalized on low-frequency event-related potentials (LF-ERPs), such as the mismatch negativity. However, local cortical activity is well-indexed by higher-frequency bands [high-γ band (Hγ): 80-150 Hz]. We compared patterns of human Hγ and LF-ERPs in deviance detection using electrocorticographic recordings from subdural electrodes over frontal and temporal cortices. Patients listened to trains of task-irrelevant tones in two conditions differing in the predictability of a deviation from repetitive background stimuli (fully predictable vs. unpredictable deviants). We found deviance-related responses in both frequency bands over lateral temporal and inferior frontal cortex, with an earlier latency for Hγ than for LF-ERPs. Critically, frontal Hγ activity but not LF-ERPs discriminated between fully predictable and unpredictable changes, with frontal cortex sensitive to unpredictable events. The results highlight the role of frontal cortex and Hγ activity in deviance detection and PE generation.


MC6: J Neurosci. 2021 Sep 22;41(38):8023-8039.

Cortical Processing of Arithmetic and Simple Sentences in an Auditory Attention Task

Joshua P Kulasingham, Neha H Joshi, Mohsen Rezaeizadeh, Jonathan Z Simon

Cortical processing of arithmetic and of language rely on both shared and task-specific neural mechanisms, which should also be dissociable from the particular sensory modality used to probe them. Here, spoken arithmetical and non-mathematical statements were employed to investigate neural processing of arithmetic, compared with general language processing, in an attention-modulated cocktail party paradigm. Magnetoencephalography (MEG) data were recorded from 22 human subjects listening to audio mixtures of spoken sentences and arithmetic equations while selectively attending to one of the two speech streams. Short sentences and simple equations were presented diotically at fixed and distinct word/symbol and sentence/equation rates. Critically, this allowed neural responses to acoustics, words, and symbols to be dissociated from responses to sentences and equations. Indeed, the simultaneous neural processing of the acoustics of words and symbols was observed in auditory cortex for both streams. Neural responses to sentences and equations, however, were predominantly to the attended stream, originating primarily from left temporal, and parietal areas, respectively. Additionally, these neural responses were correlated with behavioral performance in a deviant detection task. Source-localized temporal response functions (TRFs) revealed distinct cortical dynamics of responses to sentences in left temporal areas and equations in bilateral temporal, parietal, and motor areas. Finally, the target of attention could be decoded from MEG responses, especially in left superior parietal areas. In short, the neural responses to arithmetic and language are especially well segregated during the cocktail party paradigm, and the correlation with behavior suggests that they may be linked to successful comprehension or calculation.


MC7: Neuron. 2019 Dec 18;104(6):1195-1209.e3

Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception

James O'Sullivan, Jose Herrero, Elliot Smith, Catherine Schevon, Guy M McKhann, Sameer A Sheth, Ashesh D Mehta, Nima Mesgarani

Humans can easily focus on one speaker in a multi-talker acoustic environment, but how different areas of the human auditory cortex (AC) represent the acoustic components of mixed speech is unknown. We obtained invasive recordings from the primary and nonprimary AC in neurosurgical patients as they listened to multi-talker speech. We found that neural sites in the primary AC responded to individual speakers in the mixture and were relatively unchanged by attention. In contrast, neural sites in the nonprimary AC were less discerning of individual speakers but selectively represented the attended speaker. Moreover, the encoding of the attended speaker in the nonprimary AC was invariant to the degree of acoustic overlap with the unattended speaker. Finally, this emergent representation of attended speech in the nonprimary AC was linearly predictable from the primary AC responses. Our results reveal the neural computations underlying the hierarchical formation of auditory objects in human AC during multi-talker speech perception.


MC8: J Neurosci. 2021 Nov 3;41(44):9192-9209.

Memory Specific to Temporal Features of Sound Is Formed by Cue-Selective Enhancements in Temporal Coding Enabled by Inhibition of an Epigenetic Regulator

Elena K Rotondo, Kasia M Bieszczad

Recent investigation of memory-related functions in the auditory system have capitalized on the use of memory-modulating molecules to probe the relationship between memory and substrates of memory in auditory system coding. For example, epigenetic mechanisms, which regulate gene expression necessary for memory consolidation, are powerful modulators of learning-induced neuroplasticity and long-term memory (LTM) formation. Inhibition of the epigenetic regulator histone deacetylase 3 (HDAC3) promotes LTM, which is highly specific for spectral features of sound. The present work demonstrates for the first time that HDAC3 inhibition also enables memory for temporal features of sound. Adult male rats trained in an amplitude modulation (AM) rate discrimination task and treated with a selective inhibitor of HDAC3 formed memory that was highly specific to the AM rate paired with reward. Sound-specific memory revealed behaviorally was associated with a signal-specific enhancement in temporal coding in the auditory system; stronger phase locking that was specific to the rewarded AM rate was revealed in both the surface-recorded frequency following response and auditory cortical multiunit activity in rats treated with the HDAC3 inhibitor. Furthermore, HDAC3 inhibition increased trial-to-trial cortical response consistency (relative to naive and trained vehicle-treated rats), which generalized across different AM rates. Stronger signal-specific phase locking correlated with individual behavioral differences in memory specificity for the AM signal. These findings support that epigenetic mechanisms regulate activity-dependent processes that enhance discriminability of sensory cues encoded into LTM in both spectral and temporal domains, which may be important for remembering spectrotemporal features of sounds, for example, as in human voices and speech.


MC9: Cognition. 2022 Jan;218:104949.

Musical instrument familiarity affects statistical learning of tone sequences

Stephen C Van Hedger, Ingrid S Johnsrude, Laura J Batterink

Most listeners have an implicit understanding of the rules that govern how music unfolds over time. This knowledge is acquired in part through statistical learning, a robust learning mechanism that allows individuals to extract regularities from the environment. However, it is presently unclear how this prior musical knowledge might facilitate or interfere with the learning of novel tone sequences that do not conform to familiar musical rules. In the present experiment, participants listened to novel, statistically structured tone sequences composed of pitch intervals not typically found in Western music. Between participants, the tone sequences either had the timbre of artificial, computerized instruments or familiar instruments (piano or violin). Knowledge of the statistical regularities was measured as by a two-alternative forced choice recognition task, requiring discrimination between novel sequences that followed versus violated the statistical structure, assessed at three time points (immediately post-training, as well as one day and one week post-training). Compared to artificial instruments, training on familiar instruments resulted in reduced accuracy. Moreover, sequences from familiar instruments - but not artificial instruments - were more likely to be judged as grammatical when they contained intervals that approximated those commonly used in Western music, even though this cue was non-informative. Overall, these results demonstrate that instrument familiarity can interfere with the learning of novel statistical regularities, presumably through biasing memory representations to be aligned with Western musical structures. These results demonstrate that real-world experience influences statistical learning in a non-linguistic domain, supporting the view that statistical learning involves the continuous updating of existing representations, rather than the establishment of entirely novel ones.


YB1: Neuron. 2012 Oct 18;76(2):435-49.

Discrete neocortical dynamics predict behavioral categorization of sounds.

Bathellier B, Ushakova L, Rumpel S.

The ability to group stimuli into perceptual categories is essential for efficient interaction with the environment. Discrete dynamics that emerge in brain networks are believed to be the neuronal correlate of category formation. Observations of such dynamics have recently been made; however, it is still unresolved if they actually match perceptual categories. Using in vivo two-photon calcium imaging in the auditory cortex of mice, we show that local network activity evoked by sounds is constrained to few response modes. Transitions between response modes are characterized by an abrupt switch, indicating attractor-like, discrete dynamics. Moreover, we show that local cortical responses quantitatively predict discrimination performance and spontaneous categorization of sounds in behaving mice. Our results therefore demonstrate that local nonlinear dynamics in the auditory cortex generate spontaneous sound categories which can be selected for behavioral or perceptual decisions.


YB2: Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):9972-9977.

Top-down modulation of sensory cortex gates perceptual learning.

Caras ML, Sanes DH.

Practice sharpens our perceptual judgments, a process known as perceptual learning. Although several brain regions and neural mechanisms have been proposed to support perceptual learning, formal tests of causality are lacking. Furthermore, the temporal relationship between neural and behavioral plasticity remains uncertain. To address these issues, we recorded the activity of auditory cortical neurons as gerbils trained on a sound detection task. Training led to improvements in cortical and behavioral sensitivity that were closely matched in terms of magnitude and time course. Surprisingly, the degree of neural improvement was behaviorally gated. During task performance, cortical improvements were large and predicted behavioral outcomes. In contrast, during nontask listening sessions, cortical improvements were weak and uncorrelated with perceptual performance. Targeted reduction of auditory cortical activity during training diminished perceptual learning while leaving psychometric performance largely unaffected. Collectively, our findings suggest that training facilitates perceptual learning by strengthening both bottom-up sensory encoding and top-down modulation of auditory cortex.


YB3: Elife. 2016 Mar 4;5. pii: e12577.

The auditory representation of speech sounds in human motor cortex.

Cheung C, Hamiton LS, Johnson K, Chang EF.

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


YB4: Cell. 2021 Sep 2;184(18):4626-4639.e13.

Parallel and distributed encoding of speech across human auditory cortex

Liberty S Hamilton, Yulia Oganian, Jeffery Hall, Edward F Chang

Speech perception is thought to rely on a cortical feedforward serial transformation of acoustic into linguistic representations. Using intracranial recordings across the entire human auditory cortex, electrocortical stimulation, and surgical ablation, we show that cortical processing across areas is not consistent with a serial hierarchical organization. Instead, response latency and receptive field analyses demonstrate parallel and distinct information processing in the primary and nonprimary auditory cortices. This functional dissociation was also observed where stimulation of the primary auditory cortex evokes auditory hallucination but does not distort or interfere with speech perception. Opposite effects were observed during stimulation of nonprimary cortex in superior temporal gyrus. Ablation of the primary auditory cortex does not affect speech perception. These results establish a distributed functional organization of parallel information processing throughout the human auditory cortex and demonstrate an essential independent role for nonprimary auditory cortex in speech processing.


YB5: Neuron. 2018 May 2;98(3):630-644

A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy.

Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH.

A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy-primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems.


YB6: Nat Commun. 2019 Jun 7;10(1):2509

Adaptation of the human auditory cortex to changing background noise.

Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N.

Speech communication in real-world environments requires adaptation to changing acoustic conditions. How the human auditory cortex adapts as a new noise source appears in or disappears from the acoustic scene remain unclear. Here, we directly measured neural activity in the auditory cortex of six human subjects as they listened to speech with abruptly changing background noises. We report rapid and selective suppression of acoustic features of noise in the neural responses. This suppression results in enhanced representation and perception of speech acoustic features. The degree of adaptation to different background noises varies across neural sites and is predictable from the tuning properties and speech specificity of the sites. Moreover, adaptation to background noise is unaffected by the attentional focus of the listener. The convergence of these neural and perceptual effects reveals the intrinsic dynamic mechanisms that enable a listener to filter out irrelevant sound sources in a changing acoustic scene.


YB7: Cereb Cortex. 2018 Dec 1;28(12):4222-4233.

Neural Encoding of Auditory Features during Music Perception and Imagery.

Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN.

Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.


YB8: Science. 2014 Feb 28;343(6174):1006-10.

Phonetic feature encoding in human superior temporal gyrus.

Mesgarani N, Cheung C, Johnson K, Chang EF.

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.


YB9: Curr Biol. 2022 Apr 11;32(7):1470-1484.e12

A neural population selective for song in human auditory cortex

Norman-Haignere SV, Feather J, Boebinger D, Brunner P, Ritaccio A, McDermott JH, Schalk G, Kanwisher N.

How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.