[P2 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents (indiqués par leurs initiales). Les articles de BPC ne peuvent être choisis qu'à l'écrit.


AdC1: Neuroimage. 2018 Feb 1;166:60-70.

Encoding of natural timbre dimensions in human auditory cortex.

Allen EJ, Moerel M, Lage-Castellanos A, De Martino F, Formisano E, Oxenham AJ.

Timbre, or sound quality, is a crucial but poorly understood dimension of auditory perception that is important in describing speech, music, and environmental sounds. The present study investigates the cortical representation of different timbral dimensions. Encoding models have typically incorporated the physical characteristics of sounds as features when attempting to understand their neural representation with functional MRI. Here we test an encoding model that is based on five subjectively derived dimensions of timbre to predict cortical responses to natural orchestral sounds. Results show that this timbre model can outperform other models based on spectral characteristics, and can perform as well as a complex joint spectrotemporal modulation model. In cortical regions at the medial border of Heschl's gyrus, bilaterally, and regions at its posterior adjacency in the right hemisphere, the timbre model outperforms even the complex joint spectrotemporal modulation model. These findings suggest that the responses of cortical neuronal populations in auditory cortex may reflect the encoding of perceptual timbre dimensions.


AdC2: Hear Res. 2021 May;404:108213.

The perception of octave pitch affinity and harmonic fusion have a common origin

Laurent Demany, Guilherme Monteiro, Catherine Semal, Shihab Shamma, Robert P Carlyon

Musicians say that the pitches of tones with a frequency ratio of 2:1 (one octave) have a distinctive affinity, even if the tones do not have common spectral components. It has been suggested, however, that this affinity judgment has no biological basis and originates instead from an acculturation process ‒ the learning of musical rules unrelated to auditory physiology. We measured, in young amateur musicians, the perceptual detectability of octave mistunings for tones presented alternately (melodic condition) or simultaneously (harmonic condition). In the melodic condition, mistuning was detectable only by means of explicit pitch comparisons. In the harmonic condition, listeners could use a different and more efficient perceptual cue: in the absence of mistuning, the tones fused into a single sound percept; mistunings decreased fusion. Performance was globally better in the harmonic condition, in line with the hypothesis that listeners used a fusion cue in this condition; this hypothesis was also supported by results showing that an illusory simultaneity of the tones was much less advantageous than a real simultaneity. In the two conditions, mistuning detection was generally better for octave compressions than for octave stretchings. This asymmetry varied across listeners, but crucially the listener-specific asymmetries observed in the two conditions were highly correlated. Thus, the perception of the melodic octave appeared to be closely linked to the phenomenon of harmonic fusion. As harmonic fusion is thought to be determined by biological factors rather than factors related to musical culture or training, we argue that octave pitch affinity also has, at least in part, a biological basis.


AdC3: Elife. 2021 Jun 14;10:e62183

Glycinergic axonal inhibition subserves acute spatial sensitivity to sudden increases in sound intensity

Tom P Franken, Brian J Bondy, David B Haimes, Joshua H Goldwyn, Nace L Golding, Philip H Smith , Philip X Joris

Locomotion generates adventitious sounds which enable detection and localization of predators and prey. Such sounds contain brisk changes or transients in amplitude. We investigated the hypothesis that ill-understood temporal specializations in binaural circuits subserve lateralization of such sound transients, based on different time of arrival at the ears (interaural time differences, ITDs). We find that Lateral Superior Olive (LSO) neurons show exquisite ITD-sensitivity, reflecting extreme precision and reliability of excitatory and inhibitory postsynaptic potentials, in contrast to Medial Superior Olive neurons, traditionally viewed as the ultimate ITD-detectors. In vivo, inhibition blocks LSO excitation over an extremely short window, which, in vitro, required synaptically evoked inhibition. Light and electron microscopy revealed inhibitory synapses on the axon initial segment as the structural basis of this observation. These results reveal a neural vetoing mechanism with extreme temporal and spatial precision and establish the LSO as the primary nucleus for binaural processing of sound transients.


AdC4: J Acoust Soc Am. 2021 Apr;149(4):2644

On musical interval perception for complex tones at very high frequencies

Hedwig E Gockel, Robert P Carlyon

Listeners appear able to extract a residue pitch from high-frequency harmonics for which phase locking to the temporal fine structure is weak or absent. The present study investigated musical interval perception for high-frequency harmonic complex tones using the same stimuli as Lau, Mehta, and Oxenham [J. Neurosci. 37, 9013-9021 (2017)]. Nine young musically trained listeners with especially good high-frequency hearing adjusted various musical intervals using harmonic complex tones containing harmonics 6-10. The reference notes had fundamental frequencies (F0s) of 280 or 1400 Hz. Interval matches were possible, albeit markedly worse, even when all harmonic frequencies were above the presumed limit of phase locking. Matches showed significantly larger systematic errors and higher variability, and subjects required more trials to finish a match for the high than for the low F0. Additional absolute pitch judgments from one subject with absolute pitch, for complex tones containing harmonics 1-5 or 6-10 with a wide range of F0s, were perfect when the lowest frequency component was below about 7 kHz, but at least 50% of responses were incorrect when it was 8 kHz or higher. The results are discussed in terms of the possible effects of phase-locking information and familiarity with high-frequency stimuli on pitch.


AdC5: Comp Cogn Behav Rev. 2017;12:5-18.

Animal Pitch Perception: Melodies and Harmonies

Marisa Hoeschele

Pitch is a percept of sound that is based in part on fundamental frequency. Although pitch can be defined in a way that is clearly separable from other aspects of musical sounds, such as timbre, the perception of pitch is not a simple topic. Despite this, studying pitch separately from other aspects of sound has led to some interesting conclusions about how humans and other animals process acoustic signals. It turns out that pitch perception in humans is based on an assessment of pitch height, pitch chroma, relative pitch, and grouping principles. How pitch is broken down depends largely on the context. Most, if not all, of these principles appear to also be used by other species, but when and how accurately they are used varies across species and context. Studying how other animals compare to humans in their pitch abilities is partially a reevaluation of what we know about humans by considering ourselves in a biological context.


AdC6: Isis, volume 111, number 3.

The Unmusical Ear: Georg Simon Ohm and the Mathematical Analysis of Sound

Melle Jan Kromhout

This essay presents a detailed analysis of Georg Simon Ohm’s acous- tical research between 1839 and 1844. Because of its importance in Hermann von Helmholtz's subsequent study of sound and hearing, this work is rarely considered on its own terms. A thorough assessment of Ohm's articles, however, can greatly enrich our understanding of later developments. Based on study of Ohm's published writings, as well as a lengthy unpublished manuscript, the essay argues that his acoustical research foreshadows an important paradigmatic shift at a time of discursive instability prior to Helmholtz's influential contributions. Using Ohm's own dismissal of his supposedly 'unmusical ears' as a conceptual frame, the essay describes this shift as a move away from understanding sound primarily in a musical context and toward an increasingly mathematical approach to sound and hearing. As such, Ohm's work also anticipates a more general change in the role of the senses in nineteenth-century scientific research.


AdC7: Neuroimage. 2020 Aug 21;222:117291

High gamma cortical processing of continuous speech in younger and older listeners

Joshua P Kulasingham, Christian Brodbeck, Alessandro Presacco, Stefanie E Kuchinsky, Samira Anderson, Jonathan Z Simon

Neural processing along the ascending auditory pathway is often associated with a progressive reduction in characteristic processing rates. For instance, the well-known frequency-following response (FFR) of the auditory midbrain, as measured with electroencephalography (EEG), is dominated by frequencies from ~100 Hz to several hundred Hz, phase-locking to the acoustic stimulus at those frequencies. In contrast, cortical responses, whether measured by EEG or magnetoencephalography (MEG), are typically characterized by frequencies of a few Hz to a few tens of Hz, time-locking to acoustic envelope features. In this study we investigated a crossover case, cortically generated responses time-locked to continuous speech features at FFR-like rates. Using MEG, we analyzed responses in the high gamma range of 70-200 Hz to continuous speech using neural source-localized reverse correlation and the corresponding temporal response functions (TRFs). Continuous speech stimuli were presented to 40 subjects (17 younger, 23 older adults) with clinically normal hearing and their MEG responses were analyzed in the 70-200 Hz band. Consistent with the relative insensitivity of MEG to many subcortical structures, the spatiotemporal profile of these response components indicated a cortical origin with ~40 ms peak latency and a right hemisphere bias. TRF analysis was performed using two separate aspects of the speech stimuli: a) the 70-200 Hz carrier of the speech, and b) the 70-200 Hz temporal modulations in the spectral envelope of the speech stimulus. The response was dominantly driven by the envelope modulation, with a much weaker contribution from the carrier. Age-related differences were also analyzed to investigate a reversal previously seen along the ascending auditory pathway, whereby older listeners show weaker midbrain FFR responses than younger listeners, but, paradoxically, have stronger cortical low frequency responses. In contrast to both these earlier results, this study did not find clear age-related differences in high gamma cortical responses to continuous speech. Cortical responses at FFR-like frequencies shared some properties with midbrain responses at the same frequencies and with cortical responses at much lower frequencies.


AdC8: Sci Rep. 2019 Jul 18;9(1):10404.

Speech perception is similar for musicians and non-musicians across a wide range of conditions

Sara M K Madsen, Marton Marschall, Torsten Dau, Andrew J Oxenham

It remains unclear whether musical training is associated with improved speech understanding in a noisy environment, with different studies reaching differing conclusions. Even in those studies that have reported an advantage for highly trained musicians, it is not known whether the benefits measured in laboratory tests extend to more ecologically valid situations. This study aimed to establish whether musicians are better than non-musicians at understanding speech in a background of competing speakers or speech-shaped noise under more realistic conditions, involving sounds presented in space via a spherical array of 64 loudspeakers, rather than over headphones, with and without simulated room reverberation. The study also included experiments testing fundamental frequency discrimination limens (F0DLs), interaural time differences limens (ITDLs), and attentive tracking. Sixty-four participants (32 non-musicians and 32 musicians) were tested, with the two groups matched in age, sex, and IQ as assessed with Raven's Advanced Progressive matrices. There was a significant benefit of musicianship for F0DLs, ITDLs, and attentive tracking. However, speech scores were not significantly different between the two groups. The results suggest no musician advantage for understanding speech in background noise or talkers under a variety of conditions.


AdC9: J Acoust Soc Am. 2021 Jan;149(1):259.

Gradual decay and sudden death of short-term memory for pitch

Samuel R Mathias, Leonard Varghese, Christophe Micheyl, Barbara G Shinn-Cunningham

The ability to discriminate frequency differences between pure tones declines as the duration of the interstimulus interval (ISI) increases. The conventional explanation for this finding is that pitch representations gradually decay from auditory short-term memory. Gradual decay means that internal noise increases with increasing ISI duration. Another possibility is that pitch representations experience 'sudden death,' disappearing without a trace from memory. Sudden death means that listeners guess (respond at random) more often when the ISIs are longer. Since internal noise and guessing probabilities influence the shape of psychometric functions in different ways, they can be estimated simultaneously. Eleven amateur musicians performed a two-interval, two-alternative forced-choice frequency-discrimination task. The frequencies of the first tones were roved, and frequency differences and ISI durations were manipulated across trials. Data were analyzed using Bayesian models that simultaneously estimated internal noise and guessing probabilities. On average across listeners, internal noise increased monotonically as a function of increasing ISI duration, suggesting that gradual decay occurred. The guessing rate decreased with an increasing ISI duration between 0.5 and 2 s but then increased with further increases in ISI duration, suggesting that sudden death occurred but perhaps only at longer ISIs. Results are problematic for decay-only models of discrimination and contrast with those from a study on visual short-term memory, which found that over similar durations, visual representations experienced little gradual decay yet substantial sudden death.


AdC10: Proc Biol Sci. 2020 Jan 29;287(1919):20192001.

Mice tune out not in: violation of prediction drives auditory saliency

Meike M Rogalla, Inga Rauser, Karsten Schulze, Lasse Osterhagen, K Jannis Hildebrandt

Successful navigation in complex acoustic scenes requires focusing on relevant sounds while ignoring irrelevant distractors. It has been argued that the ability to track stimulus statistics and generate predictions supports the choice of what to attend and what to ignore. However, the role of these predictions about future auditory events in drafting decisions remains elusive. While most psychophysical studies in humans indicate that expected stimuli are more easily detected, most work studying physiological auditory processing in animals highlights the detection of unexpected, surprising stimuli. Here, we tested whether in the mouse, high target probability results in enhanced detectability or whether detection is biased towards low-probability deviants using an auditory detection task. We implemented a probabilistic choice model to investigate whether a possible dependence on stimulus statistics arises from short-term serial correlations or from integration over longer periods. Our results demonstrate that target detectability in mice decreases with increasing probability, contrary to humans. We suggest that mice indeed track probability over a timescale of at least several minutes but do not use this information in the same way as humans do: instead of maximizing reward by focusing on high-probability targets, the saliency of a target is determined by surprise.


BPC1: PLoS One. 2019 May 16;14(5):e0216874.

Music training with Démos program positively influences cognitive functions in children from low socio-economic backgrounds.

Barbaroux M, Dittinger E, Besson M

This study aimed at evaluating the impact of a classic music training program (Démos) on several aspects of the cognitive development of children from low socio-economic backgrounds. We were specifically interested in general intelligence, phonological awareness and reading abilities, and in other cognitive abilities that may be improved by music training such as auditory and visual attention, working and short-term memory and visuomotor precision. We used a longitudinal approach with children presented with standardized tests before the start and after 18 months of music training. To test for pre-to-post training improvements while discarding maturation and developmental effects, raw scores for each child and for each test were normalized relative to their age group. Results showed that Démos music training improved musicality scores, total IQ and Symbol Search scores as well as concentration abilities and reading precision. In line with previous results, these findings demonstrate the positive impact of an ecologically-valid music training program on the cognitive development of children from low socio-economic backgrounds and strongly encourage the broader implementation of such programs in disadvantaged school-settings.


BPC2: Sci Rep. 2020 Jul 8;10(1):11222.

Music, Language, and The N400: ERP Interference Patterns Across Cognitive Domains

Nicole Calma-Roddin, John E Drury

Studies of the relationship of language and music have suggested these two systems may share processing resources involved in the computation/maintenance of abstract hierarchical structure (syntax). One type of evidence comes from ERP interference studies involving concurrent language/music processing showing interaction effects when both processing streams are simultaneously perturbed by violations (e.g., syntactically incorrect words paired with incongruent completion of a chord progression). Here, we employ this interference methodology to target the mechanisms supporting long term memory (LTM) access/retrieval in language and music. We used melody stimuli from previous work showing out-of-key or unexpected notes may elicit a musical analogue of language N400 effects, but only for familiar melodies, and not for unfamiliar ones. Target notes in these melodies were time-locked to visually presented target words in sentence contexts manipulating lexical/conceptual semantic congruity. Our study succeeded in eliciting expected N400 responses from each cognitive domain independently. Among several new findings we argue to be of interest, these data demonstrate that: (i) language N400 effects are delayed in onset by concurrent music processing only when melodies are familiar, and (ii) double violations with familiar melodies (but not with unfamiliar ones) yield a sub-additive N400 response. In addition: (iii) early negativities (RAN effects), which previous work has connected to musical syntax, along with the music N400, were together delayed in onset for familiar melodies relative to the timing of these effects reported in the previous music-only study using these same stimuli, and (iv) double violation cases involving unfamiliar/novel melodies also delayed the RAN effect onset. These patterns constitute the first demonstration of N400 interference effects across these domains and together contribute previously undocumented types of interactions to the available pool of findings relevant to understanding whether language and music may rely on shared underlying mechanisms.


BPC3: Cortex. 2019 Apr;113:229-238.

The co-occurrence of pitch and rhythm disorders in congenital amusia.

Lagrois ME, Peretz I.

The most studied form of congenital amusia is characterized by a difficulty with detecting pitch anomalies in melodies, also referred to as pitch deafness. Here, we tested for the presence of associated deficits in rhythm processing, beat in particular, in pitch deafness. In Experiment 1, participants performed beat perception and production tasks with musical excerpts of various genres. The results show a beat finding disorder in six of the ten assessed pitch-deaf participants. In order to remove a putative interference of pitch variations with beat extraction, the same participants were tested with percussive rhythms in Experiment 2 and showed a similar impairment. Furthermore, musical pitch and beat processing abilities were correlated. These new results highlight the tight connection between melody and rhythm in music processing that can nevertheless dissociate in some individuals.


BPC4: Psychomusicology: Music, Mind, and Brain. 2018 Vol. 28, No. 3, 178–188

A Cross-Cultural Comparison of Tonality Perception in Japanese, Chinese, Vietnamese, Indonesian, and American Listeners

Rie Matsunaga, Toshinori Yasuda, Michelle Johnson-Motoyama, Pitoyo Hartono, Koichi Yokosawa and Jun-ichi Abe

We investigated tonal perception of melodies from 2 cultures (Western and traditional Japanese) by 5 different cultural groups (44 Japanese, 25 Chinese, 16 Vietnamese, 18 Indonesians, and 25 U.S. citizens). Listeners rated the degree of melodic completeness of the final tone (a tonic vs. a nontonic) and happiness–sadness in the mode (major vs. minor, YOH vs. IN) of each melody. When Western melodies were presented, American and Japanese listeners responded similarly, such that they reflected implicit tonal knowledge of Western music. By contrast, the responses of Chinese, Vietnamese, and Indonesian listeners were different from those of American and Japanese listeners. When traditional Japanese melodies were presented, Japanese listeners exhibited responses that reflected implicit tonal knowledge of traditional Japanese music. American listeners also showed responses that were like the Japanese; however, the pattern of responses differed between the 2 groups. Alternatively, Chinese, Vietnamese, and Indonesian listeners exhibited different responses from the Japanese. These results show large differences between the Chinese/Vietnamese/Indonesian group and the American/Japanese group. Furthermore, the differences in responses to Western melodies between Americans and Japanese were less pronounced than that between Chinese, Vietnamese, and Indonesians. These findings imply that cultural differences in tonal perception are more diverse and distinctive than previously believed.


BPC5: Behav Brain Res. 2020 Jul 15;390:112662.

Musicians use speech-specific areas when processing tones: The key to their superior linguistic competence?

Mariacristina Musso, Hannah Fürniss, Volkmar Glauche, Horst Urbach, Cornelius Weiller, Michel Rijntjes

It is known that musicians compared to non-musicians have some superior speech and language competence, yet the mechanisms how musical training leads to this advantage are not well specified. This event-related fMRI study confirmed that musicians outperformed non-musicians in processing not only of musical tones but also syllables and identified a network differentiating musicians from non-musicians during processing of linguistic sounds. Within this network, the activation of bilateral superior temporal gyrus was shared with all subjects during processing of the acoustically well-matched musical and linguistic sounds, and with the activation distinguishing tones with a complex harmonic spectrum (bowed tone) from a simpler one (plucked tone). These results confirm that better speech processing in musicians relies on improved cross-domain spectral analysis. Activation of left posterior superior temporal sulcus (pSTS), premotor cortex, inferior frontal and fusiform gyrus (FG) also distinguishing musicians from non-musicians during syllable processing overlapped with the activation segregating linguistic from musical sounds in all subjects. Since these brain-regions were not involved during tone processing in non-musicians, they could code for functions which are specialized for speech. Musicians recruited pSTS and FG during tone processing, thus these speech-specialized brain-areas processed musical sounds in the presence of musical training. This study shows that the linguistic advantage of musicians is linked not only to improved cross-domain spectral analysis, but also to the functional adaptation of brain resources that are specialized for speech, but accessible to the domain of music in the presence of musical training.


BPC6: Neuropsychologia. 2018 Aug;117:67-74.

Seeing music: The perception of melodic 'ups and downs' modulates the spatial processing of visual stimuli.

Romero-Rivas C, Vera-Constán F, Rodríguez-Cuadrado S, Puigcerver L, Fernández-Prieto I, Navarra J.

Musical melodies have peaks and valleys. Although the vertical component of pitch and music is well-known, the mechanisms underlying its mental representation still remain elusive. We show evidence regarding the importance of previous experience with melodies for crossmodal interactions to emerge. The impact of these crossmodal interactions on other perceptual and attentional processes was also studied. Melodies including two tones with different frequency (e.g., E4 and D3) were repeatedly presented during the study. These melodies could either generate strong predictions (e.g., E4-D3-E4-D3-E4-[D3]) or not (e.g., E4-D3-E4-E4-D3-[?]). After the presentation of each melody, the participants had to judge the colour of a visual stimulus that appeared in a position that was, according to the traditional vertical connotations of pitch, either congruent (e.g., high-low-high-low-[up]), incongruent (high-low-high-low-[down]) or unpredicted with respect to the melody. Behavioural and electroencephalographic responses to the visual stimuli were obtained. Congruent visual stimuli elicited faster responses at the end of the experiment than at the beginning. Additionally, incongruent visual stimuli that broke the spatial prediction generated by the melody elicited larger P3b amplitudes (reflecting 'surprise' responses). Our results suggest that the passive (but repeated) exposure to melodies elicits spatial predictions that modulate the processing of other sensory events.


BPC7: Neurosci Biobehav Rev. 2019 Dec;107:104-114.

Neural architectures of music - Insights from acquired amusia

Aleksi J Sihvonen, Teppo Särkämö, Antoni Rodríguez-Fornells, Pablo Ripollés, Thomas F Münte, Seppo Soinila

The ability to perceive and produce music is a quintessential element of human life, present in all known cultures. Modern functional neuroimaging has revealed that music listening activates a large-scale bilateral network of cortical and subcortical regions in the healthy brain. Even the most accurate structural studies do not reveal which brain areas are critical and causally linked to music processing. Such questions may be answered by analysing the effects of focal brain lesions in patients´ ability to perceive music. In this sense, acquired amusia after stroke provides a unique opportunity to investigate the neural architectures crucial for normal music processing. Based on the first large-scale longitudinal studies on stroke-induced amusia using modern multi-modal magnetic resonance imaging (MRI) techniques, such as advanced lesion-symptom mapping, grey and white matter morphometry, tractography and functional connectivity, we discuss neural structures critical for music processing, consider music processing in light of the dual-stream model in the right hemisphere, and propose a neural model for acquired amusia.


CL1: Nat Commun. 2018 Oct 16;9(1):4298.

Hidden hearing loss selectively impairs neural adaptation to loud sound environments.

Bakay WMH, Anderson LA, Garcia-Lazaro JA, McAlpine D, Schaette R

Exposure to even a single episode of loud noise can damage synapses between cochlear hair cells and auditory nerve fibres, causing hidden hearing loss (HHL) that is not detected by audiometry. Here we investigate the effects of noise-induced HHL on functional hearing by measuring the ability of neurons in the auditory midbrain of mice to adapt to sound environments containing quiet and loud periods. Neurons from noise-exposed mice show less capacity for adaptation to loud environments, convey less information about sound intensity in those environments, and adaptation to the longer-term statistical structure of fluctuating sound environments is impaired. Adaptation comprises a cascade of both threshold and gain adaptation. Although noise exposure only impairs threshold adaptation directly, the preserved function of gain adaptation surprisingly aggravates coding deficits for loud environments. These deficits might help to understand why many individuals with seemingly normal hearing struggle to follow a conversation in background noise.


CL2: J Acoust Soc Am. 2012 May;131(5):4030-41.

Across-site patterns of modulation detection: relation to speech recognition.

Garadat SN, Zwolan TA, Pfingst BE.

The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects' speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant.


CL3: J Acoust Soc Am. 2019 Mar;145(3):1493.

Comparison of effects on subjective intelligibility and quality of speech in babble for two algorithms: A deep recurrent neural network and spectral subtraction

Mahmoud Keshavarzi, Tobias Goehring, Richard E Turner, Brian C J Moore

The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.


CL4: Trends Hear. Jan-Dec 2021;25:23312165211014437.

Measuring the Influence of Noise Reduction on Listening Effort in Hearing-Impaired Listeners Using Response Times to an Arithmetic Task in Noise

Ilja Reinten, Inge De Ronde-Brons, Rolph Houben, Wouter Dreschler

Single microphone noise reduction (NR) in hearing aids can provide a subjective benefit even when there is no objective improvement in speech intelligibility. A possible explanation lies in a reduction of listening effort. Previously, we showed that response times (a proxy for listening effort) to an auditory-only dual-task were reduced by NR in normal-hearing (NH) listeners. In this study, we investigate if the results from NH listeners extend to the hearing-impaired (HI), the target group for hearing aids. In addition, we assess the relevance of the outcome measure for studying and understanding listening effort. Twelve HI subjects were asked to sum two digits of a digit triplet in noise. We measured response times to this task, as well as subjective listening effort and speech intelligibility. Stimuli were presented at three signal-to-noise ratios (SNR; -5, 0, +5 dB) and in quiet. Stimuli were processed with ideal or nonideal NR, or unprocessed. The effect of NR on response times in HI listeners was significant only in conditions where speech intelligibility was also affected (-5 dB SNR). This is in contrast to the previous results with NH listeners. There was a significant effect of SNR on response times for HI listeners. The response time measure was reasonably correlated (R142 = 0.54) to subjective listening effort and showed a sufficient test-retest reliability. This study thus presents an objective, valid, and reliable measure for evaluating an aspect of listening effort of HI listeners.


CL5: Neuroscience. 2019 May 21;407:8-20.

Primary Neural Degeneration in the Human Cochlea: Evidence for Hidden Hearing Loss in the Aging Ear

P. Z. Wu, L. D. Liberman, K. Bennett, V. de Gruttola, J. T. O’Malley and M. C. Liberman

The noise-induced and age-related loss of synaptic connections between auditory-nerve fibers and cochlear hair cells is well-established from histopathology in several mammalian species; however, its prevalence in humans, as inferred from electrophysiological measures, remains controversial. Here we look for cochlear neuropathy in a temporal-bone study of ‘‘normal-aging” humans, using autopsy material from 20 subjects aged 0–89 yrs, with no history of otologic disease. Cochleas were immunostained to allow accurate quantification of surviving hair cells in the organ Corti and peripheral axons of auditory-nerve fibers. Mean loss of outer hair cells was 30–40% throughout the audiometric frequency range (0.25–8.0 kHz) in subjects over 60 yrs, with even greater losses at both apical (low-frequency) and basal (high-frequency) ends. In contrast, mean inner hair cell loss across audiometric frequencies was rarely >15%, at any age. Neural loss greatly exceeded inner hair cell loss, with 7/11 subjects over 60 yrs showing >60% loss of peripheral axons re the youngest subjects, and with the age-related slope of axonal loss outstripping the age-related loss of inner hair cells by almost 3:1. The results suggest that a large number of auditory neurons in the aging ear are disconnected from their hair cell tar- gets. This primary neural degeneration would not affect the audiogram, but likely contributes to age-related hear- ing impairment, especially in noisy environments. Thus, therapies designed to regrow peripheral axons could provide clinically meaningful improvement in the aged ear.


DP1: Atten Percept Psychophys. 2019 Jan;81(1):253-269.

Listening back in time: Does attention to memory facilitate word-in-noise identification?

T M Vanessa Chan, Claude Alain

The ephemeral nature of spoken words creates a challenge for oral communications where incoming speech sounds must be processed in relation to representations of just-perceived sounds stored in short-term memory. This can be particularly taxing in noisy environments where perception of speech is often impaired or initially incorrect. Usage of prior contextual information (e.g., a semantically related word) has been shown to improve speech in noise identification. In three experiments, we demonstrate a comparable effect of a semantically related cue word placed after an energetically masked target word in improving accuracy of target-word identification. This effect persisted irrespective of cue modality (visual or auditory cue word) and, in the case of cues after the target, lasted even when the cue word was presented up to 4 seconds after the target. The results are framed in the context of an attention to memory model that seeks to explain the cognitive and neural mechanisms behind processing of items in auditory memory.


DP2: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics

Beyond Laurel/Yanny: An Autoencoder-Enabled Search for Polyperceivable Audio

Kartik Chandra, Chuma Kabaghe, Gregory Valiant

The famous 'laurel/yanny phenomenon references an audio clip that elicits dramatically different responses from different listeners. For the original clip, roughly half the population hears the word laurel, while the other half hears “yanny.” How common are such polyperceivable audio clips? In this paper we apply ML techniques to study the prevalence of polyperceivability in spoken language. We devise a metric that correlates with polyperceivability of audio clips, use it to efficiently find new laurel/yanny-type examples, and validate these results with human experiments. Our results suggest that polyperceivable examples are surprisingly prevalent in natural language, existing for >2% of English words.


DP3: Curr Biol. 2021 Oct 11;31(19):4367-4372.e4

Frequency modulation of rattlesnake acoustic display affects acoustic distance perception in humans

Michael Forsthofer, Michael Schutte, Harald Luksch, Tobias Kohl, Lutz Wiegrebe, Boris P Chagnaud

The estimation of one's distance to a potential threat is essential for any animal's survival. Rattlesnakes inform about their presence by generating acoustic broadband rattling sounds.1 Rattlesnakes generate their acoustic signals by clashing a series of keratinous segments onto each other, which are located at the tip of their tails.1-3 Each tail shake results in a broadband sound pulse that merges into a continuous acoustic signal with fast-repeating tail shakes. This acoustic display is readily recognized by other animals4,5 and serves as an aposematic threat and warning display, likely to avoid being preyed upon.1,6 The spectral properties of the rattling sound1,3 and its dependence on the morphology and size of the rattle have been investigated for decades7-9 and carry relevant information for different receivers, including ground squirrels that encounter rattlesnakes regularly.10,11 Combining visual looming stimuli with acoustic measurements, we show that rattlesnakes increase their rattling rate (up to about 40 Hz) with decreasing distance of a potential threat, reminiscent of the acoustic signals of sensors while parking a car. Rattlesnakes then abruptly switch to a higher and less variable rate of 60-100 Hz. In a virtual reality experiment, we show that this behavior systematically affects distance judgments by humans: the abrupt switch in rattling rate generates a sudden, strong percept of decreased distance which, together with the low-frequency rattling, acts as a remarkable interspecies communication signal.


DP4: Nat Commun. 2019 Mar 21;10(1):1302.

Optimal features for auditory categorization

Shi Tong Liu, Pilar Montes-Lourido, Xiaoqin Wang, Srivatsun Sadagopan

Humans and vocal animals use vocalizations to communicate with members of their species. A necessary function of auditory perception is to generalize across the high variability inherent in vocalization production and classify them into behaviorally distinct categories ('words' or 'call types'). Here, we demonstrate that detecting mid-level features in calls achieves production-invariant classification. Starting from randomly chosen marmoset call features, we use a greedy search algorithm to determine the most informative and least redundant features necessary for call classification. High classification performance is achieved using only 10-20 features per call type. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses. This feature-based approach also succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Our results suggest that high-level neural representations of sounds are based on task-dependent features optimized for specific computational goals.


DP5: J Acoust Soc Am. 2016 Oct;140(4):2542.

Measuring time-frequency importance functions of speech with bubble noise

Michael I Mandel, Sarah E Yoho, Eric W Healy

Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of "glimpses" of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.


DP6: Nat Commun. 2020 Jun 3;11(1):2786.

Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals

Malinda J McPherson, Sophia E Dolan, Alex Durango, Tomas Ossandon, Joaquin Valdés, Eduardo A Undurraga, Nori Jacoby, Ricardo A Godoy, Josh H McDermott

Music perception is plausibly constrained by universal perceptual mechanisms adapted to natural sounds. Such constraints could arise from our dependence on harmonic frequency spectra for segregating concurrent sounds, but evidence has been circumstantial. We measured the extent to which concurrent musical notes are misperceived as a single sound, testing Westerners as well as native Amazonians with limited exposure to Western music. Both groups were more likely to mistake note combinations related by simple integer ratios as single sounds ('fusion'). Thus, even with little exposure to Western harmony, acoustic constraints on sound segregation appear to induce perceptual structure on note combinations. However, fusion did not predict aesthetic judgments of intervals in Westerners, or in Amazonians, who were indifferent to consonance/dissonance. The results suggest universal perceptual mechanisms that could help explain cross-cultural regularities in musical systems, but indicate that these mechanisms interact with culture-specific influences to produce musical phenomena such as consonance.


DP7: Front Neurosci. 2020 Apr 15;14:362.

Effect of Auditory Predictability on the Human Peripheral Auditory System

Lars Riecke, Irina-Andreea Marianu, Federico De Martino

Auditory perception is facilitated by prior knowledge about the statistics of the acoustic environment. Predictions about upcoming auditory stimuli are processed at various stages along the human auditory pathway, including the cortex and midbrain. Whether such auditory predictions are processed also at hierarchically lower stages-in the peripheral auditory system-is unclear. To address this question, we assessed outer hair cell (OHC) activity in response to isochronous tone sequences and varied the predictability and behavioral relevance of the individual tones (by manipulating tone-to-tone probabilities and the human participants' task, respectively). We found that predictability alters the amplitude of distortion-product otoacoustic emissions (DPOAEs, a measure of OHC activity) in a manner that depends on the behavioral relevance of the tones. Simultaneously recorded cortical responses showed a significant effect of both predictability and behavioral relevance of the tones, indicating that their experimental manipulations were effective in central auditory processing stages. Our results provide evidence for a top-down effect on the processing of auditory predictability in the human peripheral auditory system, in line with previous studies showing peripheral effects of auditory attention.


DP8: Psychon Bull Rev. 2019 Apr;26(2):583-590.

There is music in repetition: Looped segments of speech and nonspeech induce the perception of music in a time-dependent manner

Jess Rowland, Anna Kasdan, David Poeppel

While many techniques are known to music creators, the technique of repetition is one of the most commonly deployed. The mechanism by which repetition is effective as a music-making tool, however, is unknown. Building on the speech-to-song illusion (Deutsch, Henthorn, & Lapidis in Journal of the Acoustical Society of America, 129(4), 2245-2252, 2011), we explore a phenomenon in which perception of musical attributes are elicited from repeated, or 'looped,' auditory material usually perceived as nonmusical such as speech and environmental sounds. We assessed whether this effect holds true for speech stimuli of different lengths; nonspeech sounds (water dripping); and speech signals decomposed into their rhythmic and spectral components. Participants listened to looped stimuli (from 700 to 4,000 ms) and provided continuous as well as discrete perceptual ratings. We show that the regularizing effect of repetition generalizes to nonspeech auditory material and is strongest for shorter clip lengths in the speech and environmental cases. We also find that deconstructed pitch and rhythmic speech components independently elicit a regularizing effect, though the effect across segment duration is different than that for intact speech and environmental sounds. Taken together, these experiments suggest repetition may invoke active internal mechanisms that bias perception toward musical structure.


DP9: Neuroscience. 2019 May 21;407:213-228.

Tinnitus: Does Gain Explain?

William Sedley

Many, or most, tinnitus models rely on increased central gain in the auditory pathway as all or part of the expla- nation, in that central auditory neurones deprived of their usual sensory input maintain homeostasis by increasing the rate at which they fire in response to any given strength of input, including amplifying spontaneous firing which forms the basis of tinnitus. However, dramatic gain changes occur in response to damage to the auditory periphery, irrespective of whether tinnitus occurs. This article considers gain in its broadest sense, summarizes its contributory processes, neural manifestations, behavioral effects, techniques for its measurement, pitfalls in attributing gain changes to tinnitus, a discussion of the minimum evidential requirements to implicate gain as a necessary and/or sufficient basis to explain tinnitus, and the extent of existing evidence in this regard. Overall there is compelling evidence that peripheral auditory insults induce changes in neuronal firing rates, synchrony and neurochemistry and thus increase gain, but specific attribution of these changes to tinnitus is generally hampered by the absence of hearing-matched human control groups or insult-exposed non-tinnitus animals. A few studies show changes specifically attributable to tinnitus at group level, but the limited attempts so far to classify individual subjects based on gain metrics have not proven successful. If gain turns out to be unnecessary or insufficient to cause tinnitus, candidate additional mechanisms include focused attention, resetting of sensory predictions, failure of sensory gating, altered sensory predictions, formation of pervasive memory traces and/or entry into global perceptual networks.


DP10: Sci Rep. 2021 Nov 2;11(1):21456.

Adaptive auditory brightness perception

Kai Siedenburg, Feline Malin Barg, Henning Schepker

Perception adapts to the properties of prior stimulation, as illustrated by phenomena such as visual color constancy or speech context effects. In the auditory domain, only little is known about adaptive processes when it comes to the attribute of auditory brightness. Here, we report an experiment that tests whether listeners adapt to spectral colorations imposed on naturalistic music and speech excerpts. Our results indicate consistent contrastive adaptation of auditory brightness judgments on a trial-by-trial basis. The pattern of results suggests that these effects tend to grow with an increase in the duration of the adaptor context but level off after around 8 trials of 2 s duration. A simple model of the response criterion yields a correlation of r = .97 with the measured data and corroborates the notion that brightness perception adapts on timescales that fall in the range of auditory short-term memory. Effects turn out to be similar for spectral filtering based on linear spectral filter slopes and filtering based on a measured transfer function from a commercially available hearing device. Overall, our findings demonstrate the adaptivity of auditory brightness perception under realistic acoustical conditions.


DP11: PLoS Biol. 2018 Oct 15;16(10):e2005164

High-resolution frequency tuning but not temporal coding in the human cochlea

Eric Verschooten, Christian Desloovere, Philip X Joris

Frequency tuning and phase-locking are two fundamental properties generated in the cochlea, enabling but also limiting the coding of sounds by the auditory nerve (AN). In humans, these limits are unknown, but high resolution has been postulated for both properties. Electrophysiological recordings from the AN of normal-hearing volunteers indicate that human frequency tuning, but not phase-locking, exceeds the resolution observed in animal models.


DP12: Trends Hear. Jan-Dec 2018;22:2331216518777174

The Pupil Dilation Response to Auditory Stimuli: Current State of Knowledge

Adriana A Zekveld, Thomas Koelewijn, Sophia E Kramer

The measurement of cognitive resource allocation during listening, or listening effort, provides valuable insight in the factors influencing auditory processing. In recent years, many studies inside and outside the field of hearing science have measured the pupil response evoked by auditory stimuli. The aim of the current review was to provide an exhaustive overview of these studies. The 146 studies included in this review originated from multiple domains, including hearing science and linguistics, but the review also covers research into motivation, memory, and emotion. The present review provides a unique overview of these studies and is organized according to the components of the Framework for Understanding Effortful Listening. A summary table presents the sample characteristics, an outline of the study design, stimuli, the pupil parameters analyzed, and the main findings of each study. The results indicate that the pupil response is sensitive to various task manipulations as well as interindividual differences. Many of the findings have been replicated. Frequent interactions between the independent factors affecting the pupil response have been reported, which indicates complex processes underlying cognitive resource allocation. This complexity should be taken into account in future studies that should focus more on interindividual differences, also including older participants. This review facilitates the careful design of new studies by indicating the factors that should be controlled for. In conclusion, measuring the pupil dilation response to auditory stimuli has been demonstrated to be sensitive method applicable to numerous research questions. The sensitivity of the measure calls for carefully designed stimuli.


MC1: Science. 2020 Feb 28;367(6481):1043-1047.

Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody

Philippe Albouy, Lucas Benjamin, Benjamin Morillon, Robert J Zatorre

Does brain asymmetry for speech and music emerge from acoustical cues or from domain-specific neural networks? We selectively filtered temporal or spectral modulations in sung speech stimuli for which verbal and melodic content was crossed and balanced. Perception of speech decreased only with degradation of temporal information, whereas perception of melodies decreased only with spectral degradation. Functional magnetic resonance imaging data showed that the neural decoding of speech and melodies depends on activity patterns in left and right auditory regions, respectively. This asymmetry is supported by specific sensitivity to spectrotemporal modulation rates within each region. Finally, the effects of degradation on perception were paralleled by their effects on neural classification. Our results suggest a match between acoustical properties of communicative signals and neural specializations adapted to that purpose.


MC2: PLoS Biol. 2016 Nov 15;14(11):e1002577

Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception

Helen Blank, Matthew H Davis

Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains.


MC3: PLoS Biol. 2020 Oct 22;18(10):e3000883.

Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers

Christian Brodbeck, Alex Jiao, L Elliot Hong, Jonathan Z Simon

Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers' spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.


MC3: PLoS Biol. 2020 Oct 22;18(10):e3000883.

Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers

Christian Brodbeck, Alex Jiao, L Elliot Hong, Jonathan Z Simon

Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers' spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.


MC4: J Neurosci. 2021 Nov 10;41(45):9374-9391.

Broadband Dynamics Rather than Frequency-Specific Rhythms Underlie Prediction Error in the Primate Auditory Cortex

Andrés Canales-Johnson, Ana Filipa Teixeira Borges, Misako Komatsu, Naotaka Fujii, Johannes J Fahrenfort, Kai J Miller, Valdas Noreika

Detection of statistical irregularities, measured as a prediction error response, is fundamental to the perceptual monitoring of the environment. We studied whether prediction error response is associated with neural oscillations or asynchronous broadband activity. Electrocorticography was conducted in three male monkeys, who passively listened to the auditory roving oddball stimuli. Local field potentials (LFPs) recorded over the auditory cortex underwent spectral principal component analysis, which decoupled broadband and rhythmic components of the LFP signal. We found that the broadband component captured the prediction error response, whereas none of the rhythmic components were associated with statistical irregularities of sounds. The broadband component displayed more stochastic, asymmetrical multifractal properties than the rhythmic components, which revealed more self-similar dynamics. We thus conclude that the prediction error response is captured by neuronal populations generating asynchronous broadband activity, defined by irregular dynamic states, which, unlike oscillatory rhythms, appear to enable the neural representation of auditory prediction error response.


MC5: Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6755-60.

Hierarchy of prediction errors for auditory events in human temporal and frontal cortex

Stefan Dürschmid, Erik Edwards, Christoph Reichert, Callum Dewar, Hermann Hinrichs, Hans-Jochen Heinze, Heidi E Kirsch, Sarang S Dalal, Leon Y Deouell, Robert T Knight

Predictive coding theories posit that neural networks learn statistical regularities in the environment for comparison with actual outcomes, signaling a prediction error (PE) when sensory deviation occurs. PE studies in audition have capitalized on low-frequency event-related potentials (LF-ERPs), such as the mismatch negativity. However, local cortical activity is well-indexed by higher-frequency bands [high-γ band (Hγ): 80-150 Hz]. We compared patterns of human Hγ and LF-ERPs in deviance detection using electrocorticographic recordings from subdural electrodes over frontal and temporal cortices. Patients listened to trains of task-irrelevant tones in two conditions differing in the predictability of a deviation from repetitive background stimuli (fully predictable vs. unpredictable deviants). We found deviance-related responses in both frequency bands over lateral temporal and inferior frontal cortex, with an earlier latency for Hγ than for LF-ERPs. Critically, frontal Hγ activity but not LF-ERPs discriminated between fully predictable and unpredictable changes, with frontal cortex sensitive to unpredictable events. The results highlight the role of frontal cortex and Hγ activity in deviance detection and PE generation.


MC6: J Neurosci. 2021 Sep 22;41(38):8023-8039.

Cortical Processing of Arithmetic and Simple Sentences in an Auditory Attention Task

Joshua P Kulasingham, Neha H Joshi, Mohsen Rezaeizadeh, Jonathan Z Simon

Cortical processing of arithmetic and of language rely on both shared and task-specific neural mechanisms, which should also be dissociable from the particular sensory modality used to probe them. Here, spoken arithmetical and non-mathematical statements were employed to investigate neural processing of arithmetic, compared with general language processing, in an attention-modulated cocktail party paradigm. Magnetoencephalography (MEG) data were recorded from 22 human subjects listening to audio mixtures of spoken sentences and arithmetic equations while selectively attending to one of the two speech streams. Short sentences and simple equations were presented diotically at fixed and distinct word/symbol and sentence/equation rates. Critically, this allowed neural responses to acoustics, words, and symbols to be dissociated from responses to sentences and equations. Indeed, the simultaneous neural processing of the acoustics of words and symbols was observed in auditory cortex for both streams. Neural responses to sentences and equations, however, were predominantly to the attended stream, originating primarily from left temporal, and parietal areas, respectively. Additionally, these neural responses were correlated with behavioral performance in a deviant detection task. Source-localized temporal response functions (TRFs) revealed distinct cortical dynamics of responses to sentences in left temporal areas and equations in bilateral temporal, parietal, and motor areas. Finally, the target of attention could be decoded from MEG responses, especially in left superior parietal areas. In short, the neural responses to arithmetic and language are especially well segregated during the cocktail party paradigm, and the correlation with behavior suggests that they may be linked to successful comprehension or calculation.


MC7: Neuron. 2019 Dec 18;104(6):1195-1209.e3

Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception

James O'Sullivan, Jose Herrero, Elliot Smith, Catherine Schevon, Guy M McKhann, Sameer A Sheth, Ashesh D Mehta, Nima Mesgarani

Humans can easily focus on one speaker in a multi-talker acoustic environment, but how different areas of the human auditory cortex (AC) represent the acoustic components of mixed speech is unknown. We obtained invasive recordings from the primary and nonprimary AC in neurosurgical patients as they listened to multi-talker speech. We found that neural sites in the primary AC responded to individual speakers in the mixture and were relatively unchanged by attention. In contrast, neural sites in the nonprimary AC were less discerning of individual speakers but selectively represented the attended speaker. Moreover, the encoding of the attended speaker in the nonprimary AC was invariant to the degree of acoustic overlap with the unattended speaker. Finally, this emergent representation of attended speech in the nonprimary AC was linearly predictable from the primary AC responses. Our results reveal the neural computations underlying the hierarchical formation of auditory objects in human AC during multi-talker speech perception.


MC8: J Neurosci. 2021 Nov 3;41(44):9192-9209.

Memory Specific to Temporal Features of Sound Is Formed by Cue-Selective Enhancements in Temporal Coding Enabled by Inhibition of an Epigenetic Regulator

Elena K Rotondo, Kasia M Bieszczad

Recent investigation of memory-related functions in the auditory system have capitalized on the use of memory-modulating molecules to probe the relationship between memory and substrates of memory in auditory system coding. For example, epigenetic mechanisms, which regulate gene expression necessary for memory consolidation, are powerful modulators of learning-induced neuroplasticity and long-term memory (LTM) formation. Inhibition of the epigenetic regulator histone deacetylase 3 (HDAC3) promotes LTM, which is highly specific for spectral features of sound. The present work demonstrates for the first time that HDAC3 inhibition also enables memory for temporal features of sound. Adult male rats trained in an amplitude modulation (AM) rate discrimination task and treated with a selective inhibitor of HDAC3 formed memory that was highly specific to the AM rate paired with reward. Sound-specific memory revealed behaviorally was associated with a signal-specific enhancement in temporal coding in the auditory system; stronger phase locking that was specific to the rewarded AM rate was revealed in both the surface-recorded frequency following response and auditory cortical multiunit activity in rats treated with the HDAC3 inhibitor. Furthermore, HDAC3 inhibition increased trial-to-trial cortical response consistency (relative to naive and trained vehicle-treated rats), which generalized across different AM rates. Stronger signal-specific phase locking correlated with individual behavioral differences in memory specificity for the AM signal. These findings support that epigenetic mechanisms regulate activity-dependent processes that enhance discriminability of sensory cues encoded into LTM in both spectral and temporal domains, which may be important for remembering spectrotemporal features of sounds, for example, as in human voices and speech.


MC9: Cognition. 2022 Jan;218:104949.

Musical instrument familiarity affects statistical learning of tone sequences

Stephen C Van Hedger, Ingrid S Johnsrude, Laura J Batterink

Most listeners have an implicit understanding of the rules that govern how music unfolds over time. This knowledge is acquired in part through statistical learning, a robust learning mechanism that allows individuals to extract regularities from the environment. However, it is presently unclear how this prior musical knowledge might facilitate or interfere with the learning of novel tone sequences that do not conform to familiar musical rules. In the present experiment, participants listened to novel, statistically structured tone sequences composed of pitch intervals not typically found in Western music. Between participants, the tone sequences either had the timbre of artificial, computerized instruments or familiar instruments (piano or violin). Knowledge of the statistical regularities was measured as by a two-alternative forced choice recognition task, requiring discrimination between novel sequences that followed versus violated the statistical structure, assessed at three time points (immediately post-training, as well as one day and one week post-training). Compared to artificial instruments, training on familiar instruments resulted in reduced accuracy. Moreover, sequences from familiar instruments - but not artificial instruments - were more likely to be judged as grammatical when they contained intervals that approximated those commonly used in Western music, even though this cue was non-informative. Overall, these results demonstrate that instrument familiarity can interfere with the learning of novel statistical regularities, presumably through biasing memory representations to be aligned with Western musical structures. These results demonstrate that real-world experience influences statistical learning in a non-linguistic domain, supporting the view that statistical learning involves the continuous updating of existing representations, rather than the establishment of entirely novel ones.


YB1: Neuron. 2012 Oct 18;76(2):435-49.

Discrete neocortical dynamics predict behavioral categorization of sounds.

Bathellier B, Ushakova L, Rumpel S.

The ability to group stimuli into perceptual categories is essential for efficient interaction with the environment. Discrete dynamics that emerge in brain networks are believed to be the neuronal correlate of category formation. Observations of such dynamics have recently been made; however, it is still unresolved if they actually match perceptual categories. Using in vivo two-photon calcium imaging in the auditory cortex of mice, we show that local network activity evoked by sounds is constrained to few response modes. Transitions between response modes are characterized by an abrupt switch, indicating attractor-like, discrete dynamics. Moreover, we show that local cortical responses quantitatively predict discrimination performance and spontaneous categorization of sounds in behaving mice. Our results therefore demonstrate that local nonlinear dynamics in the auditory cortex generate spontaneous sound categories which can be selected for behavioral or perceptual decisions.


YB2: Proc Natl Acad Sci U S A. 2017 Sep 12;114(37):9972-9977.

Top-down modulation of sensory cortex gates perceptual learning.

Caras ML, Sanes DH.

Practice sharpens our perceptual judgments, a process known as perceptual learning. Although several brain regions and neural mechanisms have been proposed to support perceptual learning, formal tests of causality are lacking. Furthermore, the temporal relationship between neural and behavioral plasticity remains uncertain. To address these issues, we recorded the activity of auditory cortical neurons as gerbils trained on a sound detection task. Training led to improvements in cortical and behavioral sensitivity that were closely matched in terms of magnitude and time course. Surprisingly, the degree of neural improvement was behaviorally gated. During task performance, cortical improvements were large and predicted behavioral outcomes. In contrast, during nontask listening sessions, cortical improvements were weak and uncorrelated with perceptual performance. Targeted reduction of auditory cortical activity during training diminished perceptual learning while leaving psychometric performance largely unaffected. Collectively, our findings suggest that training facilitates perceptual learning by strengthening both bottom-up sensory encoding and top-down modulation of auditory cortex.


YB3: Elife. 2016 Mar 4;5. pii: e12577.

The auditory representation of speech sounds in human motor cortex.

Cheung C, Hamiton LS, Johnson K, Chang EF.

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


YB4: Cell. 2021 Sep 2;184(18):4626-4639.e13.

Parallel and distributed encoding of speech across human auditory cortex

Liberty S Hamilton, Yulia Oganian, Jeffery Hall, Edward F Chang

Speech perception is thought to rely on a cortical feedforward serial transformation of acoustic into linguistic representations. Using intracranial recordings across the entire human auditory cortex, electrocortical stimulation, and surgical ablation, we show that cortical processing across areas is not consistent with a serial hierarchical organization. Instead, response latency and receptive field analyses demonstrate parallel and distinct information processing in the primary and nonprimary auditory cortices. This functional dissociation was also observed where stimulation of the primary auditory cortex evokes auditory hallucination but does not distort or interfere with speech perception. Opposite effects were observed during stimulation of nonprimary cortex in superior temporal gyrus. Ablation of the primary auditory cortex does not affect speech perception. These results establish a distributed functional organization of parallel information processing throughout the human auditory cortex and demonstrate an essential independent role for nonprimary auditory cortex in speech processing.


YB5: Nat Neurosci. 2017 Jan;20(1):62-71.

Parallel processing by cortical inhibition enables context-dependent behavior

Kishore V Kuchibhotla, Jonathan V Gill, Grace W Lindsay, Eleni S Papadoyannis, Rachel E Field, Tom A Hindmarsh Sten, Kenneth D Miller, Robert C Froemke

Physical features of sensory stimuli are fixed, but sensory perception is context dependent. The precise mechanisms that govern contextual modulation remain unknown. Here, we trained mice to switch between two contexts: passively listening to pure tones and performing a recognition task for the same stimuli. Two-photon imaging showed that many excitatory neurons in auditory cortex were suppressed during behavior, while some cells became more active. Whole-cell recordings showed that excitatory inputs were affected only modestly by context, but inhibition was more sensitive, with PV+, SOM+, and VIP+ interneurons balancing inhibition and disinhibition within the network. Cholinergic modulation was involved in context switching, with cholinergic axons increasing activity during behavior and directly depolarizing inhibitory cells. Network modeling captured these findings, but only when modulation coincidently drove all three interneuron subtypes, ruling out either inhibition or disinhibition alone as sole mechanism for active engagement. Parallel processing of cholinergic modulation by cortical interneurons therefore enables context-dependent behavior.


YB6: J Neurosci. 2018 Nov 14;38(46):9955-9966.

Implicit Memory for Complex Sounds in Higher Auditory Cortex of the Ferret.

Lu K, Liu W, Zan P, David SV, Fritz JB, Shamma SA.

Responses of auditory cortical neurons encode sound features of incoming acoustic stimuli and also are shaped by stimulus context and history. Previous studies of mammalian auditory cortex have reported a variable time course for such contextual effects ranging from milliseconds to minutes. However, in secondary auditory forebrain areas of songbirds, long-term stimulus-specific neuronal habituation to acoustic stimuli can persist for much longer periods of time, ranging from hours to days. Such long-term habituation in the songbird is a form of long-term auditory memory that requires gene expression. Although such long-term habituation has been demonstrated in avian auditory forebrain, this phenomenon has not previously been described in the mammalian auditory system. Utilizing a similar version of the avian habituation paradigm, we explored whether such long-term effects of stimulus history also occur in auditory cortex of a mammalian auditory generalist, the ferret. Following repetitive presentation of novel complex sounds, we observed significant response habituation in secondary auditory cortex, but not in primary auditory cortex. This long-term habituation appeared to be independent for each novel stimulus and often lasted for at least 20 min. These effects could not be explained by simple neuronal fatigue in the auditory pathway, because time-reversed sounds induced undiminished responses similar to those elicited by completely novel sounds. A parallel set of pupillometric response measurements in the ferret revealed long-term habituation effects similar to observed long-term neural habituation, supporting the hypothesis that habituation to passively presented stimuli is correlated with implicit learning and long-term recognition of familiar sounds.


YB7: Cereb Cortex. 2018 Dec 1;28(12):4222-4233.

Neural Encoding of Auditory Features during Music Perception and Imagery.

Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN.

Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.


YB8: Science. 2014 Feb 28;343(6174):1006-10.

Phonetic feature encoding in human superior temporal gyrus.

Mesgarani N, Cheung C, Johnson K, Chang EF.

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.