[P2 evaluation] Articles

Choisir deux articles dans la liste, provenant de deux intervenants différents. N'oubliez pas de me préciser lequel est pour l'oral et lequel est pour l'écrit. Vu que Barbara Tillmann ne pourra pas assister à l'oral, il serait préférable de choisir ses articles pour l'écrit.


AdC1 : Nature. 2008 Jan 10;451(7175):197-201.

Ultra-fine frequency tuning revealed in single neurons of human auditory cortex.

Bitterman Y, Mukamel R, Malach R, Fried I, Nelken I.

Just-noticeable differences of physical parameters are often limited by the resolution of the peripheral sensory apparatus. Thus, two-point discrimination in vision is limited by the size of individual photoreceptors. Frequency selectivity is a basic property of neurons in the mammalian auditory pathway. However, just-noticeable differences of frequency are substantially smaller than the bandwidth of the peripheral sensors. Here we report that frequency tuning in single neurons recorded from human auditory cortex in response to random-chord stimuli is far narrower than that typically described in any other mammalian species (besides bats), and substantially exceeds that attributed to the human auditory periphery. Interestingly, simple spectral filter models failed to predict the neuronal responses to natural stimuli, including speech and music. Thus, natural sounds engage additional processing mechanisms beyond the exquisite frequency tuning probed by the random-chord stimuli.


AdC2 : J Acoust Soc Am. 2001 Sep;110(3 Pt 1):1498-504.

Melody recognition using three types of dichotic-pitch stimulus.

Akeroyd MA, Moore BC, Moore GA.

The recognition of 10 different 16-note melodies, constructed using either dichotic-pitch stimuli or diotic pure-tone stimuli, was measured. The dichotic pitches were created by placing a frequency-dependent transition in the interaural phase of a noise burst. Three different configurations for the transition were used in order to give Huggins pitch, binaural-edge pitch, and binaural-coherence-edge pitch. Forty-nine inexperienced listeners participated. The melodies evoked by the dichotic stimuli were consistently identified well in the first block of trials, indicating that the sensation of dichotic pitch was relatively immediate and did not require prolonged listening experience. There were only small improvements across blocks of trials. The mean scores were 97% (pure tones), 93% (Huggins pitch), 89% (binaural-edge pitch), and 77% (binaural-coherence-edge pitch). All pairwise differences were statistically significant, indicating that Huggins pitch was the most salient of the dichotic pitches and binaural-coherence-edge pitch was weakest. To account for these differences in salience, a simulation of lateral inhibition was applied to the recovered spectrum generated by the modified equalization cancellation model [J. F. Culling, A. Q. Summerfield, and D. H. Marshall, J. Acoust. Soc. Am. 103, 3509-3526 (1998)]. The height of the peak in the resulting "edge-enhanced" recovered spectrum reflected the relative strength of the different dichotic pitches.


AdC3 : Hear Res. 2007 Nov;233(1-2):108-16. Epub 2007 Sep 5.

Temporal integration in absolute identification of musical pitch.

Hsieh IH, Saberi K.

The effect of stimulus duration on absolute identification of musical pitch was measured in a single-interval 12-alternative forced-choice task. Stimuli consisted of pure tones selected randomly on each trial from a set of 60 logarithmically spaced musical note frequencies from 65.4 to 1975.5Hz (C2-B6). Stimulus durations were 5, 10, 25, 50, 100, and 1000ms. Six absolute-pitch musicians identified the pitch of pure tones without feedback, reference sounds, or practice trials. Results showed that a 5ms stimulus is sufficient for producing statistically significant above chance performance. Performance monotonically increased up to the longest duration tested (1000ms). Higher octave stimuli produced better performance, though the rate of improvement declined with increasing octave number. Normalization by the number of waveform cycles showed that 4cycles are sufficient for absolute-pitch identification. Restricting stimuli to a fixed-cycle waveform instead of a fixed-duration still produced monotonic improvements in performance as a function of stimulus octave, demonstrating that better performance at higher frequencies does not exclusively result from a larger number of waveform cycles. Several trends in the data were well predicted by an autocorrelation model of pitch extraction, though the model outperformed observed performance at short durations suggesting an inability to make optimal use of available periodicity information in very brief tones.


AdC4 : J Acoust Soc Am. 2006 Dec;120(6):3907-15.

Individual differences in the sensitivity to pitch direction.

Semal C, Demany L.

It is commonly assumed that one can always assign a direction-upward or downward-to a percept of pitch change. The present study shows that this is true for some, but not all, listeners. Frequency difference limens (FDLs, in cents) for pure tones roved in frequency were measured in two conditions. In one condition, the task was to detect frequency changes; in the other condition, the task was to identify the direction of frequency changes. For three listeners, the identification FDL was about 1.5 times smaller than the detection FDL, as predicted (counterintuitively) by signal detection theory under the assumption that performance in the two conditions was limited by one and the same internal noise. For three other listeners, however, the identification FDL was much larger than the detection FDL. The latter listeners had relatively high detection FDLs. They had no difficulty in identifying the direction of just-detectable changes in intensity, or in the frequency of amplitude modulation. Their difficulty in perceiving the direction of small frequency/pitch changes showed up not only when the task required absolute judgments of direction, but also when the directions of two successive frequency changes had to be judged as identical or different.


AdC5 : J Acoust Soc Am. 2006 Aug;120(2):957-65.

Effect of noise on the detectability and fundamental frequency discrimination of complex tones.

Gockel H, Moore BC, Plack CJ, Carlyon RP.

Percent correct performance for discrimination of the fundamental frequency (0) of a complex tone was measured as a function of the level of a background pink noise (using fixed values of the difference in F0, deltaF0) and compared with percent correct performance for detection of the complex tone in noise, again as a function of noise level. The tone included some low, resolvable components, but not the fundamental component. The results were used to test the hypothesis that the worsening in F0 discrimination with increasing noise level was caused by the reduced detectability of the tone rather than by reduced precision of the internal representation of F0. For small values of deltaF0, the hypothesis was rejected because measured performance fell below that predicted by the hypothesis. However, this was true only for high noise levels, within 2-4.5 dB of the level required for masked threshold. The results indicate that the mechanism for extracting the F0 of a complex tone with resolved harmonics is remarkably robust. They also indicate that adding a background noise to a complex tone containing resolved harmonics is not a good means for equating its pitch salience with that of a complex tone containing only unresolved harmonics.


AdC6 : PLoS Biol. 2008 Jan;6(1):e16.

Sparse representation of sounds in the unanesthetized auditory cortex.

Hromádka T, Deweese MR, Zador AM.

How do neuronal populations in the auditory cortex represent acoustic stimuli? Although sound-evoked neural responses in the anesthetized auditory cortex are mainly transient, recent experiments in the unanesthetized preparation have emphasized subpopulations with other response properties. To quantify the relative contributions of these different subpopulations in the awake preparation, we have estimated the representation of sounds across the neuronal population using a representative ensemble of stimuli. We used cell-attached recording with a glass electrode, a method for which single-unit isolation does not depend on neuronal activity, to quantify the fraction of neurons engaged by acoustic stimuli (tones, frequency modulated sweeps, white-noise bursts, and natural stimuli) in the primary auditory cortex of awake head-fixed rats. We find that the population response is sparse, with stimuli typically eliciting high firing rates (>20 spikes/second) in less than 5% of neurons at any instant. Some neurons had very low spontaneous firing rates (<0.01 spikes/second). At the other extreme, some neurons had driven rates in excess of 50 spikes/second. Interestingly, the overall population response was well described by a lognormal distribution, rather than the exponential distribution that is often reported. Our results represent, to our knowledge, the first quantitative evidence for sparse representations of sounds in the unanesthetized auditory cortex. Our results are compatible with a model in which most neurons are silent much of the time, and in which representations are composed of small dynamic subsets of highly active neurons.


BT1 : Proc Natl Acad Sci U S A. 2005 Aug 30;102(35):12639-43.

Tuning in to musical rhythms: infants learn more readily than adults.

Hannon EE, Trehub SE.

Domain-general tuning processes may guide the acquisition of perceptual knowledge in infancy. Here, we demonstrate that 12-month-old infants show an adult-like, culture-specific pattern of responding to musical rhythms, in contrast to the culture-general responding that is evident at 6 months of age. Nevertheless, brief exposure to foreign music enables 12-month-olds, but not adults, to perceive rhythmic distinctions in foreign musical contexts. These findings may indicate a sensitive period early in life for acquiring rhythm in particular or socially and biologically important structures more generally.


BT2 : J Cogn Neurosci. 2005 Oct;17(10):1565-77.

Interaction between syntax processing in language and in music: an ERP Study.

Koelsch S, Gunter TC, Wittfoth M, Sammler D.

The present study investigated simultaneous processing of language and music using visually presented sentences and auditorily presented chord sequences. Music-syntactically regular and irregular chord functions were presented synchronously with syntactically correct or incorrect words, or with words that had either a high or a low semantic cloze probability. Music-syntactically irregular chords elicited an early right anterior negativity (ERAN). Syntactically incorrect words elicited a left anterior negativity (LAN). The LAN was clearly reduced when words were presented simultaneously with music-syntactically irregular chord functions. Processing of high and low cloze-probability words as indexed by the N400 was not affected by the presentation of irregular chord functions. In a control experiment, the LAN was not affected by physically deviant tones that elicited a mismatch negativity (MMN). Results demonstrate that processing of musical syntax (as reflected in the ERAN) interacts with the processing of linguistic syntax (as reflected in the LAN), and that this interaction is not due to a general effect of deviance-related negativities that precede an LAN. Findings thus indicate a strong overlap of neural resources involved in the processing of syntax in language and music.


BT3 : Cognition. 2000 Jul 14;76(1):13-58.

Cross-cultural music cognition: cognitive methodology applied to North Sami yoiks.

Krumhansl CL, Toivanen P, Eerola T, Toiviainen P, Jarvinen T, Louhivuori J.

This article is a study of melodic expectancy in North Sami yoiks, a style of music quite distinct from Western tonal music. Three different approaches were taken. The first approach was a statistical style analysis of tones in a representative corpus of 18 yoiks. The analysis determined the relative frequencies of tone onsets and two- and three-tone transitions. It also identified style characteristics, such as pentatonic orientation, the presence of two reference pitches, the frequency of large consonant intervals, and a relatively large set of possible melodic continuations. The second approach was a behavioral experiment in which listeners made judgments about melodic continuations. Three groups of listeners participated. One group was from the Sami culture, the second group consisted of Finnish music students who had learned some yoiks, and the third group consisted of Western musicians unfamiliar with yoiks. Expertise was associated with stronger veridical expectations (for the correct next tone) than schematic expectations (based on general style characteristics). Familiarity with the particular yoiks was found to compensate for lack of experience with the musical culture. The third approach simulated melodic expectancy with neural network models of the self-organizing map (SOM) type (Kohonen, T. (1997). Self-organizing maps (2nd ed.). Berlin: Springer). One model was trained on the excerpts of yoiks used in the behavioral experiment including the correct continuation tone, while another was trained with a set of Finnish folk songs and Lutheran hymns. The convergence of the three approaches showed that both listeners and the SOM model are influenced by the statistical distributions of tones and tone sequences. The listeners and SOM models also provided evidence supporting a core set of psychological principles underlying melody formation whose relative weights appear to differ across musical styles.


BT4 : Psychophysiology. 2004 May;41(3):341-9.

The music of speech: music training facilitates pitch processing in both music and language.

Schon D, Magne C, Besson M.

The main aim of the present experiment was to determine whether extensive musical training facilitates pitch contour processing not only in music but also in language. We used a parametric manipulation of final notes' or words' fundamental frequency (F0), and we recorded behavioral and electrophysiological data to examine the precise time course of pitch processing. We compared professional musicians and nonmusicians. Results revealed that within both domains, musicians detected weak F0 manipulations better than nonmusicians. Moreover, F0 manipulations within both music and language elicited similar variations in brain electrical potentials, with overall shorter onset latency for musicians than for nonmusicians. Finally, the scalp distribution of an early negativity in the linguistic task varied with musical expertise, being largest over temporal sites bilaterally for musicians and largest centrally and over left temporal sites for nonmusicians. These results are taken as evidence that extensive musical training influences the perception of pitch contour in spoken language.


CL1 : J Acoust Soc Am. 2006 Nov;120(5 Pt 1):2908-25.

Speech recognition in normal hearing and sensorineural hearing loss as a function of the number of spectral channels.

Baskent D.

Speech recognition by normal-hearing listeners improves as a function of the number of spectral channels when tested with a noiseband vocoder simulating cochlear implant signal processing. Speech recognition by the best cochlear implant users, however, saturates around eight channels and does not improve when more electrodes are activated, presumably due to reduced frequency selectivity caused by channel interactions. Listeners with sensorineural hearing loss may also have reduced frequency selectivity due to cochlear damage and the resulting reduction in the nonlinear cochlear mechanisms. The present study investigates whether such a limitation in spectral information transmission would be observed with hearing-impaired listeners, similar to implant users. To test the hypothesis, hearing-impaired subjects were selected from a population of patients with moderate hearing loss of cochlear origin, where the frequency selectivity would be expected to be poorer compared to normal hearing. Hearing-impaired subjects were tested for vowel and consonant recognition in steady-state background noise of varying levels using a noiseband vocoder and as a function of the number of spectral channels. For comparison, normal-hearing subjects were tested with the same stimuli at different presentation levels. In quiet and low background noise, performance by normal-hearing and hearing-impaired subjects was similar. In higher background noise, performance by hearing-impaired subjects saturated around eight channels, while performance by normal-hearing subjects continued to increase up to 12-16 channels with vowels, and 10-12 channels with consonants. A similar trend was observed for most of the presentation levels at which the normal-hearing subjects were tested. Therefore, it is unlikely that the effects observed with hearing-impaired subjects were due to insufficient audibility or high presentation levels. Consequently, the results with hearing-impaired subjects were similar to previous results obtained with implant users, but only for background noise conditions.


CL2 : J Acoust Soc Am. 1994 Oct;96(4):2048-54.

Low-pass filtering in amplitude modulation detection associated with vowel and consonant identification in subjects with cochlear implants.

Cazals Y, Pelizzone M, Saudan O, Boex C.

Temporal auditory analysis of acoustic events in various frequency channels is influenced by the ability to detect amplitude modulations which for normal hearing involves low-pass filtering with a cutoff frequency around 100 Hz and a rejection slope of about 10 dB per decade. These characteristics were established in previous studies measuring modulation transfer functions. For cochlear implant subjects, the delivery of detailed amplitude modulation information has been recently shown to result in very significant improvements in speech understanding. Several previous studies on cochlear implant subjects have reported capacities for temporal resolution rather equivalent to those of normally hearing subjects but with some notable individual differences. Recently two studies on some cochlear implant subjects indicated modulation transfer functions often quite similar to those of normal hearing but exhibiting marked individual differences in shape and absolute sensitivity. The present study compared amplitude modulation detection and phonetic recognition in a group of cochlear implant subjects to determine the extent to which the two tasks are correlated. Nine individuals who had been implanted with an Ineraid device and who demonstrated open speech understanding ranging from excellent to poor were chosen and tested in the present study. For each subject modulation transfer functions were measured at the most apical electrode and phonetic recognition of isolated vowels and intervocalic consonants was assessed. Results showed a strong correlation between the depth of high-frequency rejection in modulation transfer functions and success in vowel and consonant intelligibility. These results emphasize the importance of temporal speech features and offer perspectives for customizing signal processing in cochlear implants.


CL3 : J Acoust Soc Am. 2005 Oct;118(4):2519-26.

Consequences of cochlear damage for the detection of interaural phase differences.

Lacher-Fougere S, Demany L.

Thresholds for detecting interaural phase differences (IPDs) in sinusoidally amplitude-modulated pure tones were measured in seven normal-hearing listeners and nine listeners with bilaterally symmetric hearing losses of cochlear origin. The IPDs were imposed either on the carrier signal alone-not the amplitude modulation-or vice versa. The carrier frequency was 250, 500, or 1000 Hz, the modulation frequency 20 or 50 Hz, and the sound pressure level was fixed at 75 dB. A three-interval two-alternative forced choice paradigm was used. For each type of IPD (carrier or modulation), thresholds were on average higher for the hearing-impaired than for the normal listeners. However, the impaired listeners' detection deficit was markedly larger for carrier IPDs than for modulation IPDs. This was not predictable from the effect of hearing loss on the sensation level of the stimuli since, for normal listeners, large reductions of sensation level appeared to be more deleterious to the detection of modulation IPDs than to the detection of carrier IPDs. The results support the idea that one consequence of cochlear damage is a deterioration in the perceptual sensitivity to the temporal fine structure of sounds.


CL4 : J Acoust Soc Am. 1992 May;91(5):2881-93.

Pitch discrimination and phase sensitivity in young and elderly subjects and its relationship to frequency selectivity.

Moore BC, Peters RW.

Frequency difference limens for pure tones (DLFs) and for complex tones (DLCs) were measured for four groups of subjects: young normal hearing, young hearing impaired, elderly with near-normal hearing, and elderly hearing impaired. The auditory filters of the subjects had been measured in earlier experiments using the notched-noise method, for center frequencies (fc) of 100, 200, 400, and 800 Hz. The DLFs for both impaired groups were higher than for the young normal group at all fc's (50-4000 Hz). The DLFs at a given fc were generally only weakly correlated with the sharpness of the auditory filter at that fc, and some subjects with broad filters had near-normal DLFs at low frequencies. Some subjects in the elderly normal group had very large DLFs at low frequencies in spite of near-normal auditory filters. These results suggest a partial dissociation of frequency selectivity and frequency discrimination of pure tones. The DLCs for the two impaired groups were higher than those for the young normal group at all fundamental frequencies (fo) tested (50, 100, 200, and 400 Hz); the DLCs for the elderly normal group were intermediate. At fo = 50 Hz, DLCs for a complex tone containing only low harmonics (1-5) were markedly higher than for complex tones containing higher harmonics, for all subject groups, suggesting that pitch was conveyed largely by the higher, unresolved harmonics. For the elderly impaired group, and some subjects in the elderly normal group, DLCs were larger for a complex tone with lower harmonics (1-12) than for tones without lower harmonics (4-12 and 6-12) for fo's up to 200 Hz. Some elderly normal subjects had markedly larger-than-normal DLCs in spite of near-normal auditory filters. The DLCs tended to be larger for complexes with components added in alternating sine/cosine phase than for complexes with components added in cosine phase. Phase effects were significant for all groups, but were small for the young normal group. The results are not consistent with place-based models of the pitch perception of complex tones; rather, they suggest that pitch is at least partly determined by temporal mechanisms.


CL5 : J Assoc Res Otolaryngol. 2005 Mar;6(1):19-27. Epub 2005 Apr 22.

Noise susceptibility of cochlear implant users: the role of spectral resolution and smearing.

Fu QJ, Nogaki G.

The latest-generation cochlear implant devices provide many deaf patients with good speech recognition in quiet listening conditions. However, speech recognition deteriorates rapidly as the level of background noise increases. Previous studies have shown that, for cochlear implant users, the absence of fine spectro-temporal cues may contribute to poorer performance in noise, especially when the noise is dynamic (e.g., competing speaker or modulated noise). Here we report on sentence recognition by cochlear implant users and by normal-hearing subjects listening to an acoustic simulation of a cochlear implant, in the presence of steady or square-wave modulated speech-shaped noise. Implant users were tested using their everyday, clinically assigned speech processors. In the acoustic simulation, normal-hearing listeners were tested for different degrees of spectral resolution (16, eight, or four channels) and spectral smearing (carrier filter slopes of -24 or -6 dB/octave). For modulated noise, normal-hearing listeners experienced significant release from masking when the original, unprocessed speech was presented (which preserved the spectro-temporal fine structure), while cochlear implant users experienced no release from masking. As the spectral resolution was reduced, normal-hearing listeners' release from masking gradually diminished. Release from masking was further reduced as the degree of spectral smearing increased. Interestingly, the mean speech recognition thresholds of implant users were very close to those of normal-hearing subjects listening to four-channel spectrally smeared noise-band speech. Also, the best cochlear implant listeners performed like normal-hearing subjects listening to eight- to 16-channel spectrally smeared noise-band speech. These findings suggest that implant users' susceptibility to noise may be caused by the reduced spectral resolution and the high degree of spectral smearing associated with channel interaction. Efforts to improve the effective number of spectral channels as well as reduce channel interactions may improve implant performance in noise, especially for temporally modulated noise.


DP1 : J Acoust Soc Am. 1994 Jun;95(6):3529-40.

The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination.

Shackleton TM, Carlyon RP.

A series of experiments investigated the influence of harmonic resolvability on the pitch of, and the discriminability of differences in fundamental frequency (F0) between, frequency-modulated (FM) harmonic complexes. Both F0 (62.5 to 250 Hz) and spectral region (LOW: 125-625 Hz, MID: 1375-1875 Hz, and HIGH: 3900-5400 Hz) were varied orthogonally. The harmonics that comprised each complex could be summed in either sine (0 degree) phase (SINE) or alternating sine-cosine (0 degree-90 degrees) phase (ALT). Stimuli were presented in a continuous pink-noise background. Pitch-matching experiments revealed that the pitch of ALT-phase stimuli, relative to SINE-phase stimuli, was increased by an octave in the HIGH region, for all F0's, but was the same as that of SINE-phase stimuli when presented in the LOW region. In the MID region, the pitch of ALT-phase relative to SINE-phase stimuli depended on F0, being an octave higher at low F0's, equal at high F0's, and unclear at intermediate F0's. The same stimuli were then used in three measures of discriminability: FM detection thresholds (FMTs), frequency difference limens (FDLs), and FM direction discrimination thresholds (FMDDTs, defined as the minimum FM depth necessary for listeners to discriminate between two complexes modulated 180 degrees out of phase with each other). For all three measures, at all F0's, thresholds were low (< 4% for FMTs, < 5% for FMDDTs, and < 1.5% for FDLs) when stimuli were presented in the LOW region, and high (> 10% for FMTs, > 7% for FMDDTs, and > 2.5% for FDLs) when presented in the HIGH region. When stimuli were presented in the MID region, thresholds were low for low F0's, and high for high F0's. Performance was not markedly affected by the phase relationship between the components of a complex, except for stimuli with intermediate F0's in the MID spectral region, where FDLs and FMDDTs were much higher for ALT-phase stimuli than for SINE-phase stimuli, consistent with their unclear pitch. This difference was much smaller when FMTs were measured. The interaction between F0 and spectral region for both sets of experiments can be accounted for by a single definition of resolvability.


DP2 : .Ear Hear. 2004 Apr;25(2):173-85.

Music perception with temporal cues in acoustic and electric hearing.

Kong YY, Cruz R, Jones JA, Zeng FG.

OBJECTIVE: The first specific aim of the present study is to compare the ability of normal-hearing and cochlear implant listeners to use temporal cues in three music perception tasks: tempo discrimination, rhythmic pattern identification, and melody identification. The second aim is to identify the relative contribution of temporal and spectral cues to melody recognition in acoustic and electric hearing. DESIGN: Both normal-hearing and cochlear implant listeners participated in the experiments. Tempo discrimination was measured in a two-interval forced-choice procedure in which subjects were asked to choose the faster tempo at four standard tempo conditions (60, 80, 100, and 120 beats per minute). For rhythmic pattern identification, seven different rhythmic patterns were created and subjects were asked to read and choose the musical notation displayed on the screen that corresponded to the rhythmic pattern presented. Melody identification was evaluated with two sets of 12 familiar melodies. One set contained both rhythm and melody information (rhythm condition), whereas the other set contained only melody information (no-rhythm condition). Melody stimuli were also processed to extract the slowly varying temporal envelope from 1, 2, 4, 8, 16, 32, and 64 frequency bands, to create cochlear implant simulations. Subjects listened to a melody and had to respond by choosing one of the 12 names corresponding to the melodies displayed on a computer screen. RESULTS: In tempo discrimination, the cochlear implant listeners performed similarly to the normal-hearing listeners with rate discrimination difference limens obtained at 4-6 beats per minute. In rhythmic pattern identification, the cochlear implant listeners performed 5-25 percentage points poorer than the normal-hearing listeners. The normal-hearing listeners achieved perfect scores in melody identification with and without the rhythmic cues. However, the cochlear implant listeners performed significantly poorer than the normal-hearing listeners in both rhythm and no-rhythm conditions. The simulation results from normal-hearing listeners showed a relatively high level of performance for all numbers of frequency bands in the rhythm condition but required as many as 32 bands in the no-rhythm condition. CONCLUSIONS: Cochlear-implant listeners performed normally in tempo discrimination, but significantly poorer than normal-hearing listeners in rhythmic pattern identification and melody recognition. While both temporal (rhythmic) and spectral (pitch) cues contribute to melody recognition, cochlear-implant listeners mostly relied on the rhythmic cues for melody recognition. Without the rhythmic cues, high spectral resolution with as many as 32 bands was needed for melody recognition for normal-hearing listeners. This result indicates that the present cochlear implants provide sufficient spectral cues to support speech recognition in quiet, but they are not adequate to support music perception. Increasing the number of functional channels and improved encoding of the fine structure information are necessary to improve music perception for cochlear implant listeners.


DP3 : Psych Science 2008 Jan 19:85-91

Auditory Change Detection. Simple Sounds Are Not Memorized Better Than Complex Sounds.

Demany L, Trost W, Serman M, and Semal C

ABSTRACT - Previous research has shown that the detectability of a local change in a visual image is essentially independent of the complexity of the image when the interstimulus interval (ISI) is very short, but is limited by a low-capacity memory system when the ISI exceeds 100 ms. In the study reported here, listeners made same/different judgments on pairs of successive ''chords’’ (sums of pure tones with random frequencies). The change to be detected was always a frequency shift in one of the tones, and which tone would change was unpredictable. Performance worsened as the number of tones increased, but this effect was not larger for 2-s ISIs than for 0-ms ISIs. Similar results were obtained when a chord was followed by a single tone that had to be judged as higher or lower than the closest component of the chord. Overall, our data suggest that change detection is based on different mechanisms in audition and vision.


DP4 : J Acoust Soc Am. 2008 Sep;124(3):1653-67.

Harmonic segregation through mistuning can improve fundamental frequency discrimination.

Bernstein JG, Oxenham AJ.

This study investigated the relationship between harmonic frequency resolution and fundamental frequency (f(0)) discrimination. Consistent with earlier studies, f(0) discrimination of a diotic bandpass-filtered harmonic complex deteriorated sharply as the f(0) decreased to the point where only harmonics above the tenth were presented. However, when the odd harmonics were mistuned by 3%, performance improved dramatically, such that performance nearly equaled that found with only even harmonics present. Mistuning also improved performance when alternating harmonics were presented to opposite ears (dichotic condition). In a task involving frequency discrimination of individual harmonics within the complexes, mistuning the odd harmonics yielded no significant improvement in the resolution of individual harmonics. Pitch matches to the mistuned complexes suggested that the even harmonics dominated the pitch for f(0)'s at which a benefit of mistuning was observed. The results suggest that f(0) discrimination performance can benefit from perceptual segregation based on inharmonicity, and that poor performance when only high-numbered harmonics are present is not due to limited peripheral harmonic resolvability. Taken together with earlier results, the findings suggest that f(0) discrimination may depend on auditory filter bandwidths, but that spectral resolution of individual harmonics is neither necessary nor sufficient for accurate f(0) discrimination.


DP5 : Psych Science, in press.

Is Relative Pitch Specific to Pitch?

McDermott JH, Lehr, AJ, Oxenham AJ.

Melodies, speech, and other stimuli that vary in pitch are processed largely in terms of the relative pitch differences between sounds. Relative representations permit recognition of pitch patterns despite variations in overall pitch level between instruments or speakers. A key component of relative pitch is the sequence of pitch increases and decreases from note to note, known as the melodic contour. Here we report that contour representations are also produced by patterns in loudness and brightness (an aspect of timbre), and that contours in one dimension can be readily recognized in other dimensions, implicating similar or common representations. Most surprisingly, contours in loudness and brightness are nearly as useful as pitch contours for recognizing familiar melodies that are normally conveyed via pitch. Our results indicate that relative representations via contour extraction are a general feature of the auditory system, and may have a common central locus.


DP6 : PLoS Biol. 2008 May 20;6(5):e126.

Low-level information and high-level perception: the case of speech in noise.

Nahum M, Nelken I, Ahissar M.

Auditory information is processed in a fine-to-crude hierarchical scheme, from low-level acoustic information to high-level abstract representations, such as phonological labels. We now ask whether fine acoustic information, which is not retained at high levels, can still be used to extract speech from noise. Previous theories suggested either full availability of low-level information or availability that is limited by task difficulty. We propose a third alternative, based on the Reverse Hierarchy Theory (RHT), originally derived to describe the relations between the processing hierarchy and visual perception. RHT asserts that only the higher levels of the hierarchy are immediately available for perception. Direct access to low-level information requires specific conditions, and can be achieved only at the cost of concurrent comprehension. We tested the predictions of these three views in a series of experiments in which we measured the benefits from utilizing low-level binaural information for speech perception, and compared it to that predicted from a model of the early auditory system. Only auditory RHT could account for the full pattern of the results, suggesting that similar defaults and tradeoffs underlie the relations between hierarchical processing and perception in the visual and auditory modalities.


DP7 : J Exp Psychol Hum Percept Perform. 2008 Aug;34(4):1007-16.

Effects of context on auditory stream segregation.

Snyder JS, Carter OL, Lee SK, Hannon EE, Alain C.

The authors examined the effect of preceding context on auditory stream segregation. Low tones (A), high tones (B), and silences (-) were presented in an ABA- pattern. Participants indicated whether they perceived 1 or 2 streams of tones. The A tone frequency was fixed, and the B tone was the same as the A tone or had 1 of 3 higher frequencies. Perception of 2 streams in the current trial increased with greater frequency separation between the A and B tones (Delta f). Larger Delta f in previous trials modified this pattern, causing less streaming in the current trial. This occurred even when listeners were asked to bias their perception toward hearing 1 stream or 2 streams. The effect of previous Delta f was not due to response bias because simply perceiving 2 streams in the previous trial did not cause less streaming in the current trial. Finally, the effect of previous ?f was diminished, though still present, when the silent duration between trials was increased to 5.76 s. The time course of this context effect on streaming implicates the involvement of auditory sensory memory or neural adaptation.


DP8 : Hear Res. 2007 Jul;229(1-2):3-13. Epub 2007 Jan 24.

The distributed auditory cortex.

Winer JA, Lee CC.

A synthesis of cat auditory cortex (AC) organization is presented in which the extrinsic and intrinsic connections interact to derive a unified profile of the auditory stream and use it to direct and modify cortical and subcortical information flow. Thus, the thalamocortical input provides essential sensory information about peripheral stimulus events, which AC redirects locally for feature extraction, and then conveys to parallel auditory, multisensory, premotor, limbic, and cognitive centers for further analysis. The corticofugal output influences areas as remote as the pons and the cochlear nucleus, structures whose effects upon AC are entirely indirect, and it has diverse roles in the transmission of information through the medial geniculate body and inferior colliculus. The distributed AC is thus construed as a functional network in which the auditory percept is assembled for subsequent redistribution in sensory, premotor, and cognitive streams contingent on the derived interpretation of the acoustic events. The confluence of auditory and multisensory streams likely precedes cognitive processing of sound. The distributed AC constitutes the largest and arguably the most complete representation of the auditory world. Many facets of this scheme may apply in rodent and primate AC as well. We propose that the distributed auditory cortex contributes to local processing regimes in regions as disparate as the frontal pole and the cochlear nucleus to construct the acoustic percept.


DP9 : J Acoust Soc Am. 2005 Jun;117(6):3787-98.

A computer model of the auditory-nerve response to forward-masking stimuli.

Meddis R, O'Mard LP.

A computer model of the auditory periphery is used to study the involvement of auditory-nerve (AN) adaptation in forward-masking effects. An existing model is shown to simulate published AN recovery functions both qualitatively and quantitatively after appropriate parameter adjustments. It also simulates published data showing only small threshold shifts when a psychophysical forward-masking paradigm is applied to AN responses. The model is extended to simulate a simple but physiologically plausible mechanism for making threshold decisions based on coincidental firing of a number of AN fibers. When this is used, much larger threshold shifts are observed of a size consistent with published psychophysical observations. The problem of how stimulus-driven firing can be distinguished from spontaneous activity near threshold is also addressed by the same decision mechanism. Overall, the modeling results suggest that poststimulatory reductions in AN activity can make a substantial contribution to the raised thresholds observed in many psychophysical studies of forward masking


DP10 : J Comp Psychol. 2008 Aug;122(3):235-51.

The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it?

Bee MA, Micheyl C.

Animals often use acoustic signals to communicate in groups or social aggregations in which multiple individuals signal within a receiver's hearing range. Consequently, receivers face challenges related to acoustic interference and auditory masking that are not unlike the human cocktail party problem, which refers to the problem of perceiving speech in noisy social settings. Understanding the sensory solutions to the cocktail party problem has been a goal of research on human hearing and speech communication for several decades. Despite a general interest in acoustic signaling in groups, animal behaviorists have devoted comparatively less attention toward understanding how animals solve problems equivalent to the human cocktail party problem. After illustrating how humans and nonhuman animals experience and overcome similar perceptual challenges in cocktail-party-like social environments, this article reviews previous psychophysical and physiological studies of humans and nonhuman animals to describe how the cocktail party problem can be solved. This review also outlines several basic and applied benefits that could result from studies of the cocktail party problem in the context of animal acoustic communication.


JME1 : Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1405-8.

Suppression of cortical representation through backward conditioning.

Bao S, Chan VT, Zhang LI, Merzenich MM.

Temporal stimulus reinforcement sequences have been shown to determine the directions of synaptic plasticity and behavioral learning. Here, we examined whether they also control the direction of cortical reorganization. Pairing ventral tegmental area stimulation with a sound in a backward conditioning paradigm specifically reduced representations of the paired sound in the primary auditory cortex (AI). This temporal sequence-dependent bidirectional cortical plasticity modulated by dopamine release hypothetically serves to prevent the over-representation of frequently occurring stimuli resulting from their random pairing with unrelated rewards.


JME2 : Nature. 2007 Nov 15;450(7168):425-9.

A synaptic memory trace for cortical receptive field plasticity.

Froemke RC, Merzenich MM, Schreiner CE.

Receptive fields of sensory cortical neurons are plastic, changing in response to alterations of neural activity or sensory experience. In this way, cortical representations of the sensory environment can incorporate new information about the world, depending on the relevance or value of particular stimuli. Neuromodulation is required for cortical plasticity, but it is uncertain how subcortical neuromodulatory systems, such as the cholinergic nucleus basalis, interact with and refine cortical circuits. Here we determine the dynamics of synaptic receptive field plasticity in the adult primary auditory cortex (also known as AI) using in vivo whole-cell recording. Pairing sensory stimulation with nucleus basalis activation shifted the preferred stimuli of cortical neurons by inducing a rapid reduction of synaptic inhibition within seconds, which was followed by a large increase in excitation, both specific to the paired stimulus. Although nucleus basalis was stimulated only for a few minutes, reorganization of synaptic tuning curves progressed for hours thereafter: inhibition slowly increased in an activity-dependent manner to rebalance the persistent enhancement of excitation, leading to a retuned receptive field with new preference for the paired stimulus. This restricted period of disinhibition may be a fundamental mechanism for receptive field plasticity, and could serve as a memory trace for stimuli or episodes that have acquired new behavioural significance.


JME3 : Eur J Neurosci. 2006 Aug;24(3):857-66.

Neonatal nicotine exposure impairs nicotinic enhancement of central auditory processing and auditory learning in adult rats.

Liang K, Poytress BS, Chen Y, Leslie FM, Weinberger NM, Metherate R.

Children of women who smoke cigarettes during pregnancy display cognitive deficits in the auditory-verbal domain. Clinical studies have implicated developmental exposure to nicotine, the main psychoactive ingredient of tobacco, as a probable cause of subsequent auditory deficits. To test for a causal link, we have developed an animal model to determine how neonatal nicotine exposure affects adult auditory function. In adult control rats, nicotine administered systemically (0.7 mg/kg, s.c.) enhanced the sensitivity to sound of neural responses recorded in primary auditory cortex. The effect was strongest in cortical layers 3 and 4, where there is a dense concentration of nicotinic acetylcholine receptors (nAChRs) that has been hypothesized to regulate thalamocortical inputs. In support of the hypothesis, microinjection into layer 4 of the nonspecific nAChR antagonist mecamylamine (10 microM) strongly reduced sound-evoked responses. In contrast to the effects of acute nicotine and mecamylamine in adult control animals, neither drug was as effective in adult animals that had been treated with 5 days of chronic nicotine exposure (CNE) shortly after birth. Neonatal CNE also impaired performance on an auditory-cued active avoidance task, while having little effect on basic auditory or motor functions. Thus, neonatal CNE impairs nicotinic regulation of cortical function, and auditory learning, in the adult. Our results provide evidence that developmental nicotine exposure is responsible for auditory-cognitive deficits in the offspring of women who smoke during pregnancy, and suggest a potential underlying mechanism, namely diminished function of cortical nAChRs.


MC1 : J Neurosci. 2004 Apr 7;24(14):3637-42.

Sensitivity to auditory object features in human temporal neocortex.

Zatorre RJ, Bouffard M, Belin P.

This positron emission tomography study examined the hemodynamic response of the human brain to auditory object feature processing. A continuum of object feature variation was created by combining different numbers of stimuli drawn from a diverse sample of 45 environmental sounds. In each 60 sec scan condition, subjects heard either a distinct individual sound on each trial or simultaneous combinations of sounds that varied systematically in their similarity or distinctiveness across conditions. As more stimuli are combined they become more similar and less distinct from one another; the limiting case is when all 45 are added together to form a noise that is repeated on each trial. Analysis of covariation of cerebral blood flow elicited by this parametric manipulation revealed a response in the upper bank of the right anterior superior temporal sulcus (STS): when sounds were identical across trials (i.e., a noise made up of 45 sounds), activity was at a minimum; when stimuli were different from one another, activity was maximal. A right inferior frontal area was also revealed. The results are interpreted as reflecting sensitivity of this region of temporal neocortex to auditory object features, as predicted by neurophysiological and anatomical models implicating an anteroventral functional stream in object processing. The findings also fit with evidence that voice processing may involve regions within the anterior STS. The data are discussed in light of these models and are related to the concept that this functional stream is sensitive to invariant sound features that characterize individual auditory objects.


MC2 : Neuron. 2007 Sep 20;55(6):985-96.

Cerebral responses to change in spatial location of unattended sounds.

Deouell LY, Heller AS, Malach R, D'Esposito M, Knight RT.

The neural basis of spatial processing in the auditory cortex has been controversial. Human fMRI studies suggest that a part of the planum temporale (PT) is involved in auditory spatial processing, but it was recently argued that this region is active only when the task requires voluntary spatial localization. If this is the case, then this region cannot harbor an ongoing spatial representation of the acoustic environment. In contrast, we show in three fMRI experiments that a region in the human medial PT is sensitive to background auditory spatial changes, even when subjects are not engaged in a spatial localization task, and in fact attend the visual modality. During such times, this area responded to rare location shifts, and even more so when spatial variation increased, consistent with spatially selective adaptation. Thus, acoustic space is represented in the human PT even when sound processing is not required by the ongoing task.


MC3 : J Cogn Neurosci. 2007 Oct;19(10):1721-33.

Feature- and object-based attentional modulation in the human auditory where pathway.

Krumbholz K, Eickhoff SB, Fink GR.

Attending to a visual stimulus feature, such as color or motion, enhances the processing of that feature in the visual cortex. Moreover, the processing of the attended object's other, unattended, features is also enhanced. Here, we used functional magnetic resonance imaging to show that attentional modulation in the auditory system may also exhibit such feature- and object-specific effects. Specifically, we found that attending to auditory motion increases activity in nonprimary motion-sensitive areas of the auditory cortical "where" pathway. Moreover, activity in these motion-sensitive areas was also increased when attention was directed to a moving rather than a stationary sound object, even when motion was not the attended feature. An analysis of effective connectivity revealed that the motion-specific attentional modulation was brought about by an increase in connectivity between the primary auditory cortex and nonprimary motion-sensitive areas, which, in turn, may have been mediated by the paracingulate cortex in the frontal lobe. The current results indicate that auditory attention can select both objects and features. The finding of feature-based attentional modulation implies that attending to one feature of a sound object does not necessarily entail an exhaustive processing of the object's unattended features.


MC4 : PLoS Biol. 2008 Jun 10;6(6):e138.

Gutschalk A, Micheyl C, Oxenham AJ.

Neural correlates of auditory perceptual awareness under informational masking.

Our ability to detect target sounds in complex acoustic backgrounds is often limited not by the ear's resolution, but by the brain's information-processing capacity. The neural mechanisms and loci of this "informational masking" are unknown. We combined magnetoencephalography with simultaneous behavioral measures in humans to investigate neural correlates of informational masking and auditory perceptual awareness in the auditory cortex. Cortical responses were sorted according to whether or not target sounds were detected by the listener in a complex, randomly varying multi-tone background known to produce informational masking. Detected target sounds elicited a prominent, long-latency response (50-250 ms), whereas undetected targets did not. In contrast, both detected and undetected targets produced equally robust auditory middle-latency, steady-state responses, presumably from the primary auditory cortex. These findings indicate that neural correlates of auditory awareness in informational masking emerge between early and late stages of processing within the auditory cortex.


MC5 : PLoS Biol. 2007 Oct 23;5(11):e288.

An information theoretic characterisation of auditory encoding.

Overath T, Cusack R, Kumar S, von Kriegstein K, Warren JD, Grube M, Carlyon RP, Griffiths TD.

The entropy metric derived from information theory provides a means to quantify the amount of information transmitted in acoustic streams like speech or music. By systematically varying the entropy of pitch sequences, we sought brain areas where neural activity and energetic demands increase as a function of entropy. Such a relationship is predicted to occur in an efficient encoding mechanism that uses less computational resource when less information is present in the signal: we specifically tested the hypothesis that such a relationship is present in the planum temporale (PT). In two convergent functional MRI studies, we demonstrated this relationship in PT for encoding, while furthermore showing that a distributed fronto-parietal network for retrieval of acoustic information is independent of entropy. The results establish PT as an efficient neural engine that demands less computational resource to encode redundant signals than those with high information content.