Abstract
Humans perceive a harmonic series as a single auditory object with a pitch equivalent to the fundamental frequency (F0) of the series. When harmonics are presented to alternate ears, the repetition rate of the waveform at each ear doubles. If the harmonics are resolved, then the pitch perceived is still equivalent to F0, suggesting the stimulus is binaurally integrated before pitch is processed. However, unresolved harmonics give rise to the doubling of pitch which would be expected from monaural processing (Bernstein and Oxenham, J. Acoust. Soc. Am., 113:3323–3334, 2003). We used similar stimuli to record responses of multi-unit clusters in the central nucleus of the inferior colliculus (IC) of anesthetized guinea pigs (urethane supplemented by fentanyl/fluanisone) to determine the nature of the representation of harmonic stimuli and to what extent there was binaural integration. We examined both the temporal and rate-tuning of IC clusters and found no evidence for binaural integration. Stimuli comprised all harmonics below 10 kHz with fundamental frequencies (F0) from 50 to 400 Hz in half-octave steps. In diotic conditions, all the harmonics were presented to both ears. In dichotic conditions, odd harmonics were presented to one ear and even harmonics to the other. Neural characteristic frequencies (CF, n = 85) were from 0.2 to 14.7 kHz; 29 had CFs below 1 kHz. The majority of clusters responded predominantly to the contralateral ear, with the dominance of the contralateral ear increasing with CF. With diotic stimuli, over half of the clusters (58%) had peaked firing rate vs. F0 functions. The most common peak F0 was 141 Hz. Almost all (98%) clusters phase locked diotically to an F0 of 50 Hz, and approximately 40% of clusters still phase locked significantly (Rayleigh coefficient >13.8) at the highest F0 tested (400 Hz). These results are consistent with the previous reports of responses to amplitude-modulated stimuli. Clusters phase locked significantly at a frequency equal to F0 for contralateral and diotic stimuli but at 2F0 for dichotic stimuli. We interpret these data as responses following the envelope periodicity in monaural channels rather than as a binaurally integrated representation.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Many natural sounds are harmonic in nature. When an object, or the air column within it, vibrates, the spectrum of the sound it makes is characterized by a series of partials which are at, or near, integer multiples of a single frequency, called the fundamental (F0; Kinsler et al. 2000). Such harmonic series normally have a very strong perceptual pitch corresponding to the F0. These pitch cues are used by animals and humans for grouping and segregation of sounds (Bregman 1990; Darwin 1997; Darwin and Carlyon 1995). Although harmonic stimuli are the most ubiquitous pitch-generating natural sounds, other stimuli such as amplitude modulated tones and iterated ripple noise also generate weaker pitch percepts.
In this paper, we have two objects. The first is to determine the extent to which harmonic stimuli are represented in the temporal response and to what extent in the rate response of midbrain auditory neurons. The second is to determine the extent to which binaural signals are integrated before pitch cues are extracted.
We consider first the changes in coding of pitch stimuli between the auditory nerve and the inferior colliculus (IC). In general, below the IC, pitch information is encoded in the temporal patterns of responses but not in the mean discharge rate. The responses across the auditory nerve to pitch-generating stimuli are found to preserve the temporal features required for human pitch perception (in dial-anesthetized cats Cariani and Delgutte 1996a,b). Individual auditory nerve fibers phase lock to components of the stimulus near their characteristic frequency (CF; Horst et al. 1986, 1990; Sinex et al. 2003); however, the post stimulus time histograms contain a component at the F0 of the stimulus because of amplitude modulation of the near-CF responses (Horst et al. 1986). Similar responses are found in the cochlear nucleus (e.g., Palmer and Winter 1992; Sinex 2008), with primary-like units being most like auditory nerve and chopper neurons responding less to stimulus components and more to the envelope (Sinex 2008). In the IC, Palmer et al. (1990) found an ongoing response which was phase locked to the fundamental frequency of synthetic vowels, and phase locking to amplitude modulation has been recorded up to 600 Hz (Batra et al. 1989; Heil et al. 1995; Krishna and Semple 2000; Langner and Schreiner 1988; Nelson and Carney 2007; Rees and Moller 1987; Rees and Palmer 1989). The phase locking of IC units to pure tones appears to be limited to 600 Hz in cat (Kuwada et al. 1984) and 1 kHz in guinea pig (Liu et al. 2006).
Up to this point, we have been considering temporal responses. However, in the IC, rate tuning to modulation frequency emerges (e.g., Krishna and Semple 2000; Langner and Schreiner 1988; Rees and Moller 1983; 1987; Rees and Palmer 1989; Schreiner and Langner 1988). It has even been suggested that a modulation map exists in the IC (Langner et al. 2002; Langner and Schreiner 1988) which would allow a place map of pitch (or at least periodicity).
We consider now the evidence for binaural integration of stimuli before pitch processing. Humans have the ability to integrate harmonics which alternate between the ears into a single percept with a pitch corresponding to the fundamental of the entire complex if the harmonics are peripherally resolved (Bernstein and Oxenham 2003; Houtsma and Goldstein 1972). Additionally, it is possible to binaurally extract a pitch from dichotic stimuli, which have no pitch when the signal to either ear is presented alone (e.g., Akeroyd and Summerfield 2000a,b). These data suggest that pitch percepts might be generated after integration of the information from both ears (c.f. Bilsen et al. 1998; Zurek 1979). However, if the harmonics are peripherally unresolved, then the pitch corresponds to the repetition period of the monaural envelope (Bernstein and Oxenham 2003; Carlyon et al. 2001) which suggests the pitch may be determined before binaural integration.
Bernstein and Oxenham (2003) used dichotic harmonic stimuli which would be expected to have different periodicities and hence pitches if they were integrated binaurally before or after extraction of the fundamental. The dichotic stimuli had even harmonics played to one ear and odd harmonics to the other. By definition, in the region of resolved harmonics, there will be little peripheral interaction between each harmonic, so in order to determine a pitch, some form of across-channel combination of information is required (Goldstein 1973; Meddis and Hewitt 1991a,b; Terhardt 1974; Terhardt et al. 1982). If this happened monaurally, then a doubling of pitch might be expected since the components are spaced at 2F0. However, if the combination is based upon a binaural combination of the harmonics from each ear, then a pitch equal to the fundamental would be expected since the component spacing is F0. Psychophysically, it was found that the pitch equaled F0, thus suggesting that harmonics were combined binaurally before pitch was computed (Bernstein and Oxenham 2003)
In the region of unresolved harmonics, there will be significant interaction between components, and the repetition period of the envelope will provide a pitch cue (Assmann and Summerfield 1990; Licklider 1951; Meddis and Hewitt 1991a,b; Schouten 1940, 1970). Since the spacing of components at each ear is 2F0, then the period of the envelope will also be 2F0, so we would expect a pitch of 2F0 to be perceived whether or not binaural interaction occurred, as was found psychophysically (Bernstein and Oxenham 2003).
In this experiment, we compared the representation of diotic stimuli, comprising all harmonics played to both ears; dichotic stimuli, comprising even harmonics played to one ear and odd to the other; and alternating phase stimuli, where all harmonics were played to both ears, but the starting phase was alternated between sine and cosine. The perceived pitch of this last stimulus doubles as its constituent harmonics become unresolved (Shackleton and Carlyon 1994), so it provides a useful monaural comparison for any dichotic effect.
We studied processing in the IC because it is a site of convergence of lower pathways and thus a site at which integration of monaural and binaural pitch cues should be measurable. We found that peripherally unresolved harmonics tended to give a periodicity characteristic of the processing of the envelope of the waveform at a single ear consistent with the psychophysics of unresolved harmonics (Bernstein and Oxenham 2003; Carlyon et al. 2001). However, we cannot make any definitive statement about the processing of resolved harmonics.
Methods and stimuli
We recorded from the central nucleus of the right IC of six pigmented guinea pigs weighing 440 to 770 g. All experiments were carried out in accordance with the UK Animal (Scientific Procedures) Act of 1986. Animals were anesthetized with urethane (1.3 g/kg i.p., in 20% solution in 0.9% saline) and Hypnorm (Janssen: 0.2 ml i.m., comprising fentanyl citrate 0.315 mg/ml and fluanisone 10 mg/ml). To reduce bronchial secretions, atropine sulfate (0.06 mg/kg s.c.) was administered at the start of the experiment. Anesthesia was supplemented with further doses of Hypnorm (0.2 ml i.m.), on indication by pedal withdrawal reflex. A tracheotomy was performed, and core temperature was maintained at 38°C via a heating blanket and rectal probe. Heart rate was monitored using a pair of electrodes in the skin on either side of the animal’s thorax. Animals were artificially ventilated with pure oxygen to keep the end-tidal partial pressure of CO2 between 24 and 36 mmHg. The animals were placed inside a sound-attenuating room in a stereotaxic frame in which hollow plastic speculae replaced the ear bars, allowing sound presentation and direct visualization of the tympanic membrane. A craniotomy was performed over the position of the IC. The dura was reflected and the surface of the brain covered by a solution of 1.5% agar in 0.9% saline. Recordings were made using a linear array of eight glass-insulated tungsten electrodes (Bullock et al. 1988), nominally spaced at 200 μm, advanced through the intact cerebral cortex by a piezoelectric motor (Burleigh Inchworm IW-700/710, Scientifica, Uckfield, UK). Extracellular action potentials were amplified and filtered between 300 Hz and 3 kHz (RA16AC, RA16PA, 4xRA16BA, Tucker-Davis Technologies, Alachua, FL, USA). Responses were collected using Brainware (v7.43, Jan Schnupp, Oxford University). The location of recordings in the central nucleus of IC (ICC) was indicated by the combination of stereotaxic coordinates, the physiological profiles of recordings on the approach to ICC, and the physiological profile of the recordings within ICC (well tuned, short latency responses with a monotonically increasing CF as depth increased). In a number of experiments, location was also confirmed by the recovery of electrolytic lesions.
Stimuli were delivered to each ear through sealed acoustic systems comprising custom-modified Radioshack 40-1377 tweeters joined via a conical section to a damped 2.5-mm diameter, 34-mm long tube (M. Ravicz, Eaton Peabody Laboratory, Boston, MA), which fitted into the hollow speculum. The output was calibrated a few millimeters from the tympanic membrane using a Brüel and Kjær 4134 microphone fitted with a calibrated 1-mm probe tube. The maximum output was within 2 dB of 120 dB sound pressure level (SPL) up to 1 kHz, and then varied smoothly between 100 and 120 dB up to 50 kHz. Stimuli were not corrected for the variation in calibration across frequency.
Stimuli were digitally synthesized (RP2.1, Tucker-Davis Technologies) at 50 kHz sampling rate and output through 24-bit sigma-delta digital-to-analog converters. Stimuli were of 100-ms duration, switched on and off simultaneously in the two ears with cosine-squared gates with 2 ms rise/fall times (10% to 90%). A response area was first obtained using tonal stimuli (0 to 100 dB in 5 dB steps, 200 Hz to 20 kHz in 0.1 octave steps) followed by a rate vs. level function (0 to 100 dB in 5 dB steps) for harmonic complexes with a 100 Hz F0, presented to the left, right, and both ears. The characteristic frequency and threshold were estimated from the response area as the frequency which elicited a response at the lowest level.
Once these preliminary data had been obtained, the main experiment was run in a single block comprising 100 repeats of all seven fundamentals and six condition combinations in random order. Stimuli consisted of harmonic series containing all of the harmonics up to 10 kHz with F0s from 50 to 400 Hz in half-octave steps. Harmonics were summed in sine phase, unless otherwise stated. The level of individual harmonics was 50 dB SPL, yielding a total level of between 61 and 73 dB SPL depending upon condition. Six conditions were presented (Fig. 1): (1) contralateral, all harmonics in the left ear; (2) ipsilateral, all harmonics in the right ear; (3) diotic, all harmonics in both ears; (4) dichotic 1, even harmonics in the left ear and odd harmonics in the right; (5) dichotic 2, odd harmonics in the left ear and even harmonics in the right; and (6) alternating phase, harmonics alternating between sine (even harmonics) and cosine (odd) phase presented to both ears. In general, however, there was little response to the ipsilateral stimulus alone, and the responses to the dichotic 2 stimuli were very similar to those to the dichotic 1 stimuli, so these results will not be shown in this paper.
Data were analyzed using peristimulus time histograms (PSTHs) calculated between 5 and 105 ms after stimulus onset with 0.2 ms bins and the Fourier transform of the PSTH (yielding a bin width of 10 Hz and 2,500 Hz Nyquist frequency). Rate information was obtained from the zero frequency component of the Fourier transform, which is equal to the spike count obtained conventionally when suitably scaled. Vector strength (Goldberg and Brown 1969) was also obtained from the Fourier transform of the PSTH by dividing the amplitude at each Fourier frequency by the amplitude at zero frequency. In this paper, we report only vector strengths which were statistically significant (p < 0.001; Rayleigh test of uniformity; Buunen and Rhode 1978; Mardia 1972).
The autocorrelation function (ACF) of the spike trains or all-order interval histogram (e.g., Cariani and Delgutte 1996a,b) was calculated for spikes between 5 and 105 ms after stimulus onset with 0.2 ms bin widths. A shuffled autocorrelation function (SACF; Joris 2003; Joris et al. 2006; Louage et al. 2004) was computed over the same interval but with 0.04 ms bin widths. The SACF was normalized as suggested by Louage et al. (2004), so that a value of 1 would be expected in the absence of temporal structure. The SACF generally showed the same features as the ACF but was considerably smoother despite the narrower binwidths because of the larger number of intervals used in its construction. Because of this similarity, only the SACF will be shown in this paper.
Results
Complete data sets were obtained from 85 multi-unit clusters. Clusters ranged in CF from 0.2 to 14.7 kHz; 29 had CFs below 1 kHz. It was remarkable how homogeneous the responses of all clusters were to these stimuli. The most salient details of the responses of these clusters will be described briefly below for a single cluster, which can be taken as a description of all clusters’ responses. Examples of the responses to the diotic stimulus for a cluster with a CF of 1.6 kHz are shown in Figure 2A–D. There was a sustained response at low fundamentals but more adaptation for higher fundamentals. At the higher fundamentals, the population exhibited either a low sustained response, like Figure 2D, or responded only at the onset. During the sustained response, the cluster phase locked at F0. This is visible in the PSTHs of Figure 2 and is shown by the clear peaks at F0 in the Fourier spectra (arrows in Fig. 2E–H) and 1/F0 in the SACF (thick arrows in Fig. 2I–L).
Many clusters also showed evidence of a temporal response at frequencies higher than F0. There are a great many components at multiples of F0 in the Fourier spectra (Fig. 2E–H), although these should be interpreted with care. Because of nonlinearities in the generation of the PSTH, the Fourier transform of the PSTH will contain components which are not in the stimulus, so even if the cluster only responded to a single stimulus component, we would expect harmonics in the Fourier transform of the response.
The responses of the example cluster to stimuli with a fundamental of 50 Hz are shown in Figure 3. The responses to the stimulus played to the contralateral ear only are similar to those played diotically (compare PSTHs 3 A, B, spectra 3 E, F and SACFs 3 I, J). The dichotic and alternating phase stimuli, however, generate PSTHs with peaks at half the period of those to the diotic stimulus (compare Fig. 3C, D with Fig. 3B). This behavior is illustrated more clearly in the spectra (Fig. 3F–H), where the response at F0 (black arrow) drops relative to the nearly constant response at 2F0 (grey arrow) in the dichotic and alternating phase conditions (Fig. 3G, H). The SACFs (Fig. 3J–L) show the same effect but in a complementary form. The response to the diotic stimulus has many intervals corresponding to F0 (black arrow, Fig. 3J) but none corresponding to 2F0. However, the dichotic and alternating phase stimuli generate many intervals corresponding to 2F0 and an approximately equal number corresponding to F0. This behavior suggests that this cluster is responding primarily to the envelope of the stimulus at the contralateral ear.
Temporal properties within the population
The PSTHs in Figure 2 showed phase locking which was maintained up to 400 Hz. This was true for about 40% of clusters in the population as a whole, whereas in most the phase locking declined with increasing F0. The highest F0 at which there was significant phase locking is plotted as a function of CF in Figure 4. For contralateral and diotic stimuli (Fig. 4A), the majority of clusters phase locked to F0s at and above 282.8 Hz, and only one did not phase lock to any F0 (plotted at zero F0). Many clusters phase locked significantly to the highest F0 we used, i.e., were at ceiling. Although the upper limit of locking was at ceiling for all CFs, the lower limit increased with increasing CF. For dichotic and alternating phase stimuli (Fig. 4B), the highest F0 locked to was generally lower, as indicated by far fewer points at the ceiling of 400 Hz.
The extent to which the envelope of the stimulus is represented in the PSTH can be determined from the Fourier transform of the PSTH. The spectra in Figures 2 and 3 showed strong components at both F0 and 2F0 for the monaural and diotic stimuli. The strong component at 2F0 would be expected to inevitably accompany a response at F0 because the Fourier transform is a representation of the shape of the PSTH which is, to some extent, a half-wave rectified version of the band-pass filtered stimulus. However, for the dichotic and alternating phase stimuli, there was no response at F0 and a strong component at 2F0. This absence of a response at F0 cannot be explained in terms of distortions inherent in the formation of the spectrum of the PSTH and thus reflects a real effect. The vector strength of locking to these components is shown in Figure 5, with locking at F0 shown as filled circles and locking at 2F0 as open circles. There was a wide range of vector strengths, with the highest values, and greatest range occurring in response to the contralateral and diotically presented lower F0s. The average vector strength decreased as F0 increased. There was no noticeable trend for vector strength to change as a function of CF. There was a significant overlap in the spread of phase locking and no apparent difference in mean phase locking between F0 and 2F0 in the contralateral and diotic conditions. However, for the dichotic and alternating phase conditions, the strength of locking at F0 (filled circles) was generally very low, while that at 2F0 (open circles) was generally higher except at the highest F0s. This is consistent with phase locking to the envelope at a single, dominant ear only, rather than to a binaurally integrated stimulus.
The Fourier transform of the PSTH shows the degree to which the envelope of the stimulus is represented in the envelope of the PSTH. However, a more direct measure of the temporal coding is provided by the autocorrelation of the spike trains. The SACFs shown in Figures 2 and 3 show peaks at 1/F0 for monaural and diotic stimuli and to both 1/F0 and 1/2F0 for dichotic and alternating phase stimuli. In an autocorrelation analysis of a stimulus with period 1/f, we expect intervals at all integer multiples of the period (n/f); so the peak at 1/F0 in dichotic and alternating phase conditions is the second order response to a period of 1/2F0 (i.e., 2/2F0). Thus, the Fourier and autocorrelation analyses are complimentary. If the response is predominantly at F0, then we expect components at both F0 and 2F0 in the Fourier analysis but only at 1/F0 in the SACF, whereas if the response is predominantly at 2F0, then we expect a component only at 2F0 in the Fourier analysis but at both 1/F0 and 1/2F0 in the SACF. The magnitudes of the peaks corresponding to F0 and 2F0 in the SACF are plotted in Figure 6. In Figure 6, a value of 1 corresponds to no temporal structure (Louage et al. 2004). For monaural and diotic stimuli in Figure 6, the magnitude of the peak corresponding to 2F0 (open circles) was generally very low, whereas for all F0s apart from 400 Hz, the peak corresponding to F0 (filled circles) was much higher. For dichotic and alternating phase stimuli, the magnitude of the two peak heights was more nearly equal. Taken together, all of these results are consistent with phase locking to the envelope at a single, dominant ear only rather than to a binaurally integrated stimulus.
Tuning of the discharge rate to F0
In the preceding section, we considered the temporal responses to harmonic stimuli and showed evidence for a doubling in the frequency of response for dichotic stimuli. However, as discussed in the introduction, within the IC, a rate representation of periodicity begins to emerge. In this section, we describe how the rate response of clusters changes with changing F0 and between diotic and dichotic stimuli. The rate responses of the example cluster described earlier are shown in Figure 7. The response to diotic stimuli had a peak at an F0 of 282.8 Hz (defined as the best F0). The responses to the dichotic and alternating phase stimuli also showed a maximum firing rate, but the best F0 was shifted down an octave, to 141.4 Hz. These results are consistent with the response being determined by the envelope of the stimulus at the contralateral ear. If the frequency of the stimulus envelope determines the rate response rather than the F0, then the maximum of the rate response will be shifted down an octave when plotted against F0, since the envelope frequency in the dichotic conditions is 2F0.
Figure 7 showed an example of tuning of the discharge rate to F0. Most clusters showed such clear tuning, with a best F0 between 70.7 and 282.8 Hz. The best F0 is plotted in Figure 8A for the contralateral and diotic stimuli and in Figure 8B for the dichotic and alternating phase stimuli. Clusters that had a best F0 at or below 50 Hz or at or above 400 Hz would not have been distinguishable from low-pass or high-pass clusters and are plotted at each end of the distribution. Some clusters responded best at both low and high F0s, with a dip between them; these band-reject clusters are plotted as BR in Figure 8. Some clusters showed negligible or nonsystematic modulation across F0; these are plotted as None. A number of clusters showed double peaks in the rate response systematically across conditions, with the higher frequency peak an octave above the lower. These clusters were plotted according to the lower frequency peak. For contralateral and diotic stimuli, the modal best F0 was at 141.4 Hz, whereas for dichotic and alternating phase stimuli, it was at 70.7 Hz. This finding demonstrates that, like the example in Figure 7, the best F0 for dichotic responses tended to be an octave lower than for diotic responses.
It is conceivable (although unlikely) that, for individual clusters, the rate tuning for different stimuli may have been different. To check that this was not the case, the dichotic best F0s are plotted against the diotic best F0s for each cluster individually in Figure 9A. The diagonal solid line represents equality, whereas the dashed line represents the dichotic F0 being half the diotic F0. The points fall mostly on the dashed line, with a scattering of a few points away from it. In other words, the population trend described above, with dichotic responses being tuned to half the F0 of diotic responses, also tends to hold for individual clusters. A similar plot for the alternating phase stimulus is shown in Figure 9B. While there are more points falling away from the octave relation line, the majority are still on or near it. In other words, the response to alternating phase stimuli also tends to peak an octave below that for diotic stimuli for each cluster individually as well as across the population. In both plots, the cluster of points on the equal F0 line at 50 and 400 Hz are potentially an artifact of the analysis since we did not use a dichotic or alternating stimulus at 25 Hz F0 or a diotic stimulus at 800 Hz which would have been necessary to test for an octave relationship. There are no striking differences in best F0 as a function of CF (Fig. 10), although there is a very weak tendency for tuning to higher best F0s to occur at higher CFs (compare density of points in Fig. 10A). Since these data are the same as those plotted in Figure 8 in a different form, it is no surprise that Figure 10B looks very like Figure 10A plotted an octave lower in best F0, i.e., that tuning to lower F0s occurs more often for the dichotic and alternating phase stimuli.
Monaural balance of responsiveness
It has already been mentioned that ipsilateral responses were generally weaker than those to contralateral stimuli. The ratios of the firing rate in response to ipsilateral stimuli relative to contralateral are shown in Figure 11. The contralateral response completely dominates for CFs greater than 2 kHz, whereas the balance is less extreme but still favoring the contralateral side for lower CFs. The marginal histogram in Figure 11 confirms that most clusters are more strongly driven by contralateral than ipsilateral stimuli.
Discussion
We have measured the responses of clusters of neurons in the IC to harmonic series where all the harmonics are played to each ear or alternate harmonics played to alternate ears. In the first section of the discussion, we compare these results to earlier studies that predominately used amplitude modulated tones. In the second section, we contrast the rate and temporal representations of pitch cues in the IC. Finally, we compare the current results with the available psychophysics. We found no evidence for combination of binaural cues for pitch in the IC, but we did not sample enough conditions with resolved harmonics to determine whether there were differences in the processing of resolved and unresolved harmonics as seen in the psychophysics.
Comparison with earlier studies
Virtually all clusters phase locked diotically at F0, with some locking up to the highest F0 tested (400 Hz; Fig. 4). This is consistent with the ability of IC units to phase lock to pure tones up to 600 Hz in cat (Kuwada et al. 1984) and 1 kHz in guinea pig (Liu et al. 2006). It is also consistent with the frequency range of phase locking to the envelope of sinusoidally amplitude modulated (SAM) tones and noise (256–600 Hz; Batra et al. 1989; Heil et al. 1995; Krishna and Semple 2000; Langner and Schreiner 1988; Muller-Preuss et al. 1994; Nelson and Carney 2007; Rees and Moller 1987; Rees and Palmer 1989). We found that the best F0 in the rate tuning for contralateral and diotic presentation was 141 Hz. This is at the upper end of the range of best rate modulation frequency tuning found using SAM (30–160 Hz; Heil et al. 1995; Langner and Schreiner 1988; Nelson and Carney 2007; Rees and Moller 1987). The balance between low-pass and band-pass tuning changes as a function of level (Krishna and Semple 2000; Rees and Moller 1987; Rees and Palmer 1989) and age (Heil et al. 1995), so it is possible that a small disagreement between the modal best F0 for harmonic stimuli and SAM might be because of differences in effective level. Krishna and Semple (2000) argue that rate modulation transfer functions are shaped by inhibitory side bands, so the difference in number and amplitude of components between harmonic stimuli and amplitude modulation may account for the slight difference in modal best F0, although later modeling (Nelson and Carney 2007) downplays the role of inhibition in shaping rate modulation transfer functions.
Sinex et al. (2002, 2005) and Sinex and Li (2007) reported that IC units in chinchillas only responded strongly when one of the partials in a harmonic complex was mistuned or when two harmonic complexes with different F0s were presented. However, they did not report many responses to purely harmonic series, and they used a fixed F0 of 250 Hz, an F0 to which many of the clusters in our sample did not phase lock as well as to lower F0s. Additionally, they used long stimuli, up to 500 ms, and tended only to analyze the last 400 ms. Since our stimuli were 100 ms long, the reports analyzed the responses over different ranges of response duration.
Pitch cue representation in the IC
In this paper, we are primarily concerned with how cues to the pitch of harmonic series are represented in the IC and with evidence for their integration across ears. Historically, models of pitch perception polarized into those which concentrated on determining the pitch from the pattern of components which were resolved in the auditory periphery (so-called place models; e.g., Goldstein 1973; Terhardt 1974; Terhardt et al. 1982) and those which extracted the pitch from the temporal envelope of unresolved harmonics (so-called temporal models; e.g., Schouten 1940, 1970). More recently, models have been proposed which combine information about the timing of spikes across frequency from both resolved and unresolved harmonics (e.g., Assmann and Summerfield 1990; Licklider 1951; Meddis and Hewitt 1991a,b; Slaney and Lyon 1990). Since the fidelity of phase locking declines as the auditory system is ascended, it is likely that at some point in the auditory system such a temporal representation will be converted into a place representation, with the spatial pattern of neuron firing rates representing the pitch. It has been claimed that such a map exists within the IC (Langner et al. 2002; Langner and Schreiner 1988). We found clusters which showed tuning to F0, so they are candidates for such a map (Fig. 7); however, the maximum of the distribution was around 140 Hz for diotic stimuli (Fig. 8), so it is unlikely that they can be involved in the representation of all musical pitches (middle A is 440 Hz, with the highest pitches being about 5 kHz). Most of the analyses we have presented are of the temporal intervals present within individual frequency channels. These analyses have shown that F0s can be represented in the firing pattern of individual clusters up to 400 Hz. The representation may exist at higher F0s, but we have no data on this; even so, it is unlikely that the representation will exist any higher than the limits of pure tone phase-locking in the IC of 1,000 Hz (Liu et al. 2006 and previous section).
We have not analyzed the across-CF combination of activity by combining temporal activity across clusters (e.g., like in a summary autocorrelation: Meddis and Hewitt 1991a,b). However, IC neurons exhibit excitatory and inhibitory receptive fields which are more complex and wider than the auditory nerve (e.g., Egorova et al. 2001; Ehret and Schreiner 2004; Le Beau et al. 2001; Le Beau et al. 1996; Ramachandran et al. 1999) so there is scope for across-CF and across-ear interaction within individual IC clusters. The degree to which we found across-ear interactions is discussed in the following sections.
Evidence of monaural dominance
The most striking feature of our results is the comparison between diotic stimuli and dichotic stimuli. The frequency of the temporal response of clusters to dichotic stimuli was consistently twice that of the response to diotic stimuli, and the best F0 of each cluster’s rate tuning for dichotic stimuli was consistently half that for diotic stimuli. Both of these results are consistent with the cluster responding predominantly to the envelope of the stimulus at a single ear. The response may be the result of the input to the IC cluster being identical from each ear (see below for a discussion of this), or it may be that the IC cluster is only receiving input from a single ear. Figure 11 shows that the balance of excitation is from the contralateral ear, so it is perhaps not surprising that the responses show a response that is characteristic of a single-ear input. However, it is possible that there may be inhibitory, facilitory, or subthreshold inputs from the ipsilateral ear. The evidence for contralateral ear dominance would be more convincing if it was strongly correlated with the evidence for envelope processing. The ratio of phase locking to 2F0 relative to locking to F0 is plotted against the ratio of ipsilateral to contralateral firing rates in Figure 12. Since a weak F0 response (and hence a large 2F0/F0 ratio) indicates strong locking to the envelope at a single ear, we would expect a strong correlation with the ipsilateral/contralateral firing rate ratio if this effect could be explained purely in terms of monaural dominance. The correlation is not strong, so our results cannot be explained purely in terms of contralateral ear dominance.
Relationship to human psychophysics and resolution of harmonics
If components of a stimulus are unresolved, then several components will interact within an auditory nerve response area, and the output will be amplitude modulated. The frequency of this amplitude modulation will be equal to F0 for our diotic stimuli and will be equal to 2F0 for the dichotic and alternating phase stimuli. To a crude approximation, the auditory nerve outputs will resemble band-pass-filtered and half-wave-rectified versions of the waveforms shown in Figure 1. Importantly, the envelopes for even-harmonic and for odd-harmonic stimuli will be very similar. It is therefore unlikely that any processing could reconstruct a binaural response which did not reflect the periodicity of the auditory nerve outputs. In other words, for responses to unresolved harmonics, our results are largely predictable from filtering on the basilar membrane and auditory nerve firings. Whether there were binaural interactions or not would not affect this result. The question then becomes: to what extent are the responses reported here to resolved or unresolved harmonics?
There are many definitions of harmonic resolution (see Bernstein and Oxenham 2003; Shackleton and Carlyon 1994 for a discussion of a few). However, for the purposes of this paper, if the individual components of a harmonic stimulus interact significantly within the response area of an auditory nerve fiber so that the firing rate becomes amplitude modulated, then the stimulus is unresolved at the CF of that fiber. This definition is consistent with that presented by Shackleton and Carlyon (1994), who derived a rule for determining whether harmonics were resolved or not, specifically that stimuli are unresolved when more than 3.25 harmonics occur within the 10-dB bandwidth of an auditory filter (1.8 times the equivalent rectangular bandwidth, ERB) and resolved when two or fewer are within the filter, with a region of ambiguity between. Guinea pig auditory nerve bandwidths and their behavioral tuning curves (Evans 2001; Evans et al. 1992) are wider than those obtained psychophysically from humans, so there is a potential problem in comparing our data with human psychophysics. However, Evans (2001; Evans et al. 1992) derived a formula for the guinea pig ERB as a function of CF (ERB = 0.3 CF0.56), which we can use in the rule given above to estimate whether stimuli are resolved at the CF of the clusters we recorded from. These resolution limits are plotted on Figures 4, 5, 6, and 10, where it can be seen that our data should encompass responses to both resolved and unresolved harmonics. There is no obvious difference in the results between resolved and unresolved regions. This result would not be expected from the human psychophysics (Bernstein and Oxenham 2003), where a doubling of pitch for unresolved relative to resolved harmonics was found, and it was suggested this was due to binaural integration for resolved harmonics but not for unresolved harmonics. The fact that we find no difference between resolved and unresolved harmonics suggests our results may not be due to resolvability but may well be due to a lack of binaural integration for pitch at the level of the auditory midbrain. However, it is only at an F0 of 400 Hz where this comparison can be made over a significant number of clusters, and at this F0, the phase locking to the stimulus is lower anyway. Therefore, it is not clear whether we have sufficient data to comment unequivocally upon whether there is a difference between the processing of resolved and unresolved harmonics within the IC.
We have no real answer therefore to the question of whether processing of resolved harmonics could proceed monaurally and then be combined or could be based on a binaural integration. The psychophysical results for resolved harmonics (Bernstein and Oxenham 2003; Houtsma and Goldstein 1972) and the fact that pitch can be perceived in stimuli where the pitch cues are generated by binaural processing (e.g., Akeroyd and Summerfield 2000b) all lead to the conclusion that “binaural” and “monaural” pitch cues can be integrated into a single percept (Akeroyd and Summerfield 2000a). The data we have presented here provide no evidence to suggest that this occurs at the level of the IC. Further research is clearly required to determine how and where this occurs within the auditory system.
References
Akeroyd MA, Summerfield AQ. Integration of monaural and binaural evidence of vowel formants. J. Acoust. Soc. Am. 107:3394–3406, 2000a.
Akeroyd MA, Summerfield AQ. The lateralization of simple dichotic pitches. J. Acoust. Soc. Am. 108:316–334, 2000b.
Assmann P, Summerfield Q. Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies. J. Acoust. Soc. Am. 88:680–697, 1990.
Batra R, Kuwada S, Stanford TR. Temporal coding of envelopes and their interaural delays in the inferior colliculus of the unanesthetized rabbit. J. Neurophysiol. 61:257–268, 1989.
Bernstein JG, Oxenham AJ. Pitch discrimination of diotic and dichotic tone complexes: Harmonic resolvability or harmonic number? J. Acoust. Soc. Am. 113:3323–3334, 2003.
Bilsen FA, van der Meulen AP, Raatgever J. Salience and JND of pitch for dichotic noise stimuli with scattered harmonics: Grouping and the central spectrum theory. In: Palmer AR, Rees A, Summerfield AQ, Meddis R (eds) Psychophysical and Physiological Advances in Hearing. London, Whurr, pp. 403–411, 1998.
Bregman AS. Auditory Scene Analysis. Cambridge, MA., M.I.T. Press, 1990.
Bullock DC, Palmer AR, Rees A. Compact and easy-to-use tungsten-in-glass microelectrode manufacturing workstation. Med. Biol. Eng. Comput. 26:669–672, 1988.
Buunen TJ, Rhode W. Responses of fibers in the cat’s auditory nerve to the cubic difference tone. J. Acoust. Soc. Am. 64:772–781, 1978.
Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J. Neurophysiol. 76:1698–1716, 1996a.
Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. J. Neurophysiol. 76:1717–1734, 1996b.
Carlyon RP, Demany L, Deeks J. Temporal pitch perception and the binaural system. J. Acoust. Soc. Am. 109:686–700, 2001.
Darwin CJ. Auditory grouping. Trends Cog. Sci. 1:327–333, 1997.
Darwin CJ, Carlyon RP. Auditory Grouping. In: Moore BCJ (ed) Hearing. San Diego, Academic, pp. 387–424, 1995.
Egorova M, Ehret G, Vartanian I, Esser KH. Frequency response areas of neurons in the mouse inferior colliculus. I. Threshold and tuning characteristics. Exp. Brain. Res. 140:145–161, 2001.
Ehret G, Schreiner C. Spectral and intensity coding in the auditory midbrain. In: Winer JA, Schreiner C (eds) The Inferior Colliculus. New York, Springer, pp. 312–345, 2004.
Evans EF. Latest comparisons between physiological and behavioural frequency selectivity. In: Houtsma AJM, Kohlraush A, Prijs VF, Schoonhoven R (eds) Physiological and Psychophysical Bases of Auditory Function. Maastricht, Shaker Publishing BV, pp. 382–387, 2001.
Evans EF, Pratt SR, Spenner H, Cooper NP. Comparisons of physiological and behavioural properties: Auditory frequency selectivity. In: Cazals Y, Demany L, Horner K (eds) Auditory Physiology and Perception.. Oxford, Pergamon, pp. 159–170, 1992.
Goldberg JM, Brown PB. Response of binaural neurons of dog superior olivary complex to dichotic tonal stimuli: Some physiological mechanisms of sound localization. J. Neurophysiol. 32:613–636, 1969.
Goldstein JL. An optimum processor theory for the central formation of the pitch of complex tones. J. Acoust. Soc. Am. 54:1496–1516, 1973.
Heil P, Schultz H, Langner G. Ontogenetic development of periodicity coding in the inferior colliculus of the Mongolian gerbil. Audit. Neurosci. 1:363–383, 1995.
Horst JW, Javel E, Farley GR. Coding of spectral fine structure in the auditory nerve. I. Fourier analysis of period and interspike interval histograms. J. Acoust. Soc. Am. 79:398–416, 1986.
Horst JW, Javel E, Farley GR. Coding of spectral fine structure in the auditory nerve. II: Level-dependent nonlinear responses. J. Acoust. Soc. Am. 88:2656–2681, 1990.
Houtsma HJM, Goldstein JL. The central origin of the pitch of complex tones: Evidence from musical interval recognition. J. Acoust. Soc. Am. 44:807–812, 1972.
Joris PX. Interaural time sensitivity dominated by cochlea-induced envelope patterns. J. Neurosci. 23:6345–6350, 2003.
Joris PX, Louage DHG, Cardoen L, Van der Heijden M. Correlation Index: A new metric to quantify temporal coding. Hear Res. 216–217:19–30, 2006.
Kinsler LE, Frey AR, Coppens AB, Sanders JV. Fundamentals of Acoustics. New York, Wiley, 2000.
Krishna BS, Semple MN. Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior colliculus. J. Neurophysiol. 84:255–273, 2000.
Kuwada S, Yin TCT, Syka J, Buunen TJF, Wickesberg RE. Binaural interaction in low-frequency neurons in inferior colliculus of the cat. IV. Comparison of monaural and binaural response properties. J. Neurophysiol. 51:1306–1325, 1984.
Langner G, Albert M, Briede T. Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger). Hear Res. 168:110–130, 2002.
Langner G, Schreiner CE. Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. J. Neurophysiol. 60:1799–1822, 1988.
Le Beau FEN, Malmierca MS, Rees A. Iontophoresis in vivo demonstrates a key role for GABAA and glycinergic inhibition in shaping frequency response areas in the inferior colliculus of guinea pig. J. Neurosci. 21:7303–7312, 2001.
Le Beau FEN, Rees A, Malmierca MS. Contribution of GABA- and glycine-mediated inhibition to the monaural temporal response properties of neurons in the inferior colliculus. J. Neurophysiol. 75:902–919, 1996.
Licklider JCR. A duplex theory of pitch perception. Experientia 7:128–133, 1951.
Liu L, Palmer AR, Wallace MN. Phase-locked responses to pure tones in the inferior colliculus. J. Neurophysiol. 95:1926–1935, 2006.
Louage DHG, van der Heijden M, Joris PX. Temporal properties of responses to broadband noise in the auditory nerve. J. Neurophysiol. 91:2051–2065, 2004.
Mardia KV. Statistics of Directional Data. New York, Academic, 1972.
Meddis R, Hewitt M. Virtual pitch and phase sensitivity studied using a computer model of the auditory periphery: Phase sensitivity. J. Acoust. Soc. Am. 89:2883–2894, 1991a.
Meddis R, Hewitt M. Virtual pitch and phase sensitivity studied using a computer model of the auditory periphery: Pitch identification. J. Acoust. Soc. Am. 89:2866–2882, 1991b.
Muller-Preuss P, Flachskamm C, Bieser A. Neural encoding of amplitude modulation within the auditory midbrain of squirrel monkeys. Hear Res. 80:197–208, 1994.
Nelson PC, Carney LH. Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus. J. Neurophysiol. 97:522–539, 2007.
Palmer AR, Rees A, Caird DM. Interaural delay sensitivity to tones and broad band signals in the guinea-pig inferior colliculus. Hear Res. 50:71–86, 1990.
Palmer AR, Winter IM. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. In: Cazals Y, Demany LD, Homer K (eds) Auditory Physiology and Perception. Oxford, Pergamon, pp. 231–239, 1992.
Ramachandran R, Davis KA, May BJ. Single-unit responses in the inferior colliculus of decerebrate cats I. Classification based on frequency response maps. J. Neurophysiol. 82:152–163, 1999.
Rees A, Moller AR. Responses of neurons in the inferior colliculus of the rat to AM and FM tones. Hear Res. 10:301–330, 1983.
Rees A, Moller AR. Stimulus properties influencing the responses of inferior colliculus neurons to amplitude-modulated sounds. Hear Res. 27:129–143, 1987.
Rees A, Palmer AR. Neuronal responses to amplitude-modulated and pure-tone stimuli in the guinea pig inferior colliculus, and their modification by broadband noise. J. Acoust. Soc. Am. 85:1978–1994, 1989.
Schouten JF. The residue and the mechanism of hearing. Proc. Kon. Akad. Wetenschap. 43:991–999, 1940.
Schouten JF. The residue revisited. In: Plomp R, Smoorenburg GF (eds) Frequency Analysis and Periodicity Detection in Hearing. Lieden, Sijthoff, 1970.
Schreiner CE, Langner G. Periodicity coding in the inferior colliculus of the cat. II. Topographical organization. J. Neurophysiol. 60:1823–1840, 1988.
Shackleton TM, Carlyon RP. The role of resolved and unresolved harmonics in pitch perception and frequency-modulation discrimination. J. Acoust. Soc. Am. 95:3529–3540, 1994.
Sinex DG. Responses of cochlear nucleus neurons to harmonic and mistuned complex tones. Hear Res. 238:39–48, 2008.
Sinex DG, Guzik H, Li HZ, Sabes JH. Responses of auditory nerve fibers to harmonic and mistuned complex tones. Hear Res. 182:130–139, 2003.
Sinex DG, Sabes JH, Li H. Responses of inferior colliculus neurons to harmonic and mistuned complex tones. Hear Res. 168:150–162, 2002.
Sinex DG, Li H. Responses of inferior colliculus neurons to double harmonic tones. J. Neurophysiol. 98:3171–3184, 2007.
Sinex DG, Li H, Velenovsky DS. Prevalence of stereotypical responses to mistuned complex tones in the inferior colliculus. J. Neurophysiol. 94:3523–3537, 2005.
Slaney M, and Lyon RF. A perceptual pitch detector. Proc Int Conf Acoustics, Speech, and Signal Processing 357–360, 1990.
Terhardt E. Pitch, consonance, and harmony. J. Acoust. Soc. Am. 55:1061–1069, 1974.
Terhardt E, Stoll G, Seewann M. Algorithm for extraction of pitch salience from complex tonal signals. J. Acoust. Soc. Am. 71:679–688, 1982.
Zurek PM. Measurements of binaural echo suppression. J. Acoust. Soc. Am. 66:1750–1757, 1979.
Acknowledgments
We would like to thank the three anonymous reviewers and the Associate Editor, Philip Joris, for their many very helpful comments on earlier drafts of this manuscript.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Shackleton, T.M., Liu, Lf. & Palmer, A.R. Responses to Diotic, Dichotic, and Alternating Phase Harmonic Stimuli in the Inferior Colliculus of Guinea Pigs. JARO 10, 76–90 (2009). https://doi.org/10.1007/s10162-008-0149-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10162-008-0149-4