Discrimination Measure of Correlations in a Population of Neurons by Using the Jensen‐Shannon Divergence

. The significance of synchronized spikes fired by nearby neurons for perception is still unclear. To evaluate how reliably one can decide if a given response on the population coding of sensory information comes from the full distribution, or from the product of independent distributions from each cell, we used recorded responses of pairs of single neurons in primary visual cortex of macaque monkey (VI) to stimuli of varying orientation. Both trial-to-trial variability and synchrony were found to depend stimulus orientation and contrast in this data set (A. Kohn, and M. A Smith, J. Neurosci. 25 (2005) 3661). We used the Jensen-Shannon Divergence for fixed stimuli as a measure of discrimination between a pairs of correlated cells VI. The Jensen-Shannon divergence, can be consider as a measure distance between the corresponding probability distribution function associated with each spikes fired observed patterns. The Nemenman-Shafee-Bialek estimator was used in our entropy estimation in order to remove all possible bias deviation from our calculations. We found that the relative Jensen-Shannon Divergence (measured in relation to case in which all cell fired completely independently) decreases with respect to the difference in orientation preference between the receptive field from each pair of cells. Our finding indicates that the Jensen-Shannon Divergence may be used for characterizing the effective circuitry network in a population of neurons.


INTRODUCTION
One of the main challenges in neuroscience during the last 30 years was to demonstrate that distributed neural populations in the visual cortex process information in a cooperative way. The visual cortex is composed of a large number of areas, which contain neurons that are tuned to different visual features. Temporally correlated activity of individual neuronal pairs within the visual cortex has been investigated in many laboratories starting in the early 1980's [1,2,3,5,4,6,7,8,9,10] most often with a motivation to reveal structural coupling between cells. This approach to functional anatomy had been methodologically and conceptually outlined and successfully applied to the invertebrate nervous system [11,12,13,14,15,16]. The results of these many different studies taken together clearly demonstrated the presence of temporally precise correlation within the cat and macaque visual cortex. But, these correlations occurred preferentially between cells showing the same feature preference.
Cortical neurons with similar stimulus selectivities are found in close proximity to each other [17,18,19]. How neurons in the cerebral cortex represent a stimulus taking into account that the response variability is correlated across neurons is still highly debated [20,21,22,23]. At another level, these correlations when present may have significant effect on the population coding of sensory information. Several pieces of evidence point to correlations between VI neurons being orientation and contrast dependent [24,25]. The importance of identifying appropriate measures of correlation, independence and quantify their relation to the stimulus has been pointed out by [26].
The Jensen-Shannon divergence measures how reliably one can decide if a given response comes from the full distribution, p{r\Y2), or the product of independent distributions, p{r\) -pfa). In present work we use the Jensen-Shannon Divergence for fixed stimuli as a measure of discrimination between a pairs of correlated cells in the primary visual cortex of anesthetized macaque monkeys [24]. Our findings indicates that the Jensen-Shannon Divergence may be used for characterizing the circuitry network in a population of neurons.

Entropy estimations
The problem with real measures of entropy is that they depend on a limited number of samples provided by the experiment. It is therefore important to use a theoretical approach that can remove sample size dependent bias from the entropy estimations. But, it is well known that approaches usually performed to estimate the entropy tend to underestimate it since they are biased, and this effect can only be avoided by the use of a perturbative expansion in the asymptotic regime [27]. The idea of calculating entropy by counting coincidences was proposed a long time ago by Ma [28] for physical systems in the micro-canonical ensemble, where a uniform distribution of entropy corresponds to states of fixed energy. The Bayesian prior proposed by Nemenman, Bialek and Steveninck (NSB) [29] extends this idea to an arbitrarily complex distribution. The goal of this method is to construct a Bayesian prior, which generates a nearly uniform distribution of entropies in order to correct sample size dependent bias at its source.
In the following we review the basic ideas of the NSB entropy estimator method [29]. Let consider the problem of estimating the Shannon entropy for a given probability distribution p = {pi}, where the index i runs over K possibilities. Considering N samples (trials) which were obtained from a given experiment, where each possibility i occurred n, times. IfN is much bigger than K we can approximate pi« fi = rii/N, and p might be the distribution of spike counts observed to be fired by a neuron.
The present approach tends to underestimate the entropy and attempts to solve this problem were made to asymptotic bias corrections by adding a term of order 0{K/N) [30,31]. This approach was developed further by [27] who made use of a Bayesian prior for the number of relevant bins, and iteratively re-estimated it.
Bayes' theorem states, ,<",,, = asjEip, where P(p) is the prior distribution and N is the total number of experimental trials, N=Yf «;, from i observed events which occurred n, times In order to eliminate the bias deviation it would be desirable to find a prior P(p) does not depend upon the number of trials. We can express P(p) in terms of the Dirichlet family of priors [29], which allow us to construct a prior which does not depend on inverse powers of N.
where Z is a function which enforces the normalization, and the entropy is specified fixing a particular value of /3 [32]. Ideally, we would like to compute the whole a priori distribution of entropies Defining where | is the expected entropy, one can fix a flat prior distribution of entropy P{H). The idea of the NSB method is to build a family of prior Pp (p) which result in 5 functions over H, by changing /3 over the whole range of entropies uniformly, removing bias at its source. The Bayesian prior proposed by Nemenman, Bialek and Steveninck [29], reach the requirements, generating a nearly uniform distribution of entropies in order to avoid bias at its origin (Z is a normalizing factor and -|^ ensures the uniformity for a priori expected entropy |). We performed calculations of the total mutual information conveyed by a single pair of cells in VI [24] using the naive estimator, using the Panzeri-Treves bias correction, and using the NSB method. Figure 1 shows the effectiveness of the entropy estimator developed by Nemenman, Bialek and Steveninck [29], for which sampling was obtained at only 100 trials.

Jensen-Shannon divergence
The Jensen-Shannon divergence (JSD) is defined by [33,34], where DKL is the Kullback-Leibler divergence between the two distributions.
The JSD allow us to decide how reliably one can decide if a given response comes from two different distributions. However, unlike the Kullback-Leibler divergence, it is symmetric, always well defined and bounded. In particular if we choose P = P(r\s), the probability of getting response vector r conditional upon the occurrence of stimulus s, and Q = Pind{r\s), the probability of getting independent population responses, where P{r\r 2 ) = ~Z s p ( s ) p ( r i r 2\s) andPi"^(r) = ~Z s p ( s ) p ( r i \s)P(r 2 \s) are the average of P(r\s) and Pi"d(r\s) over all the possible stimuli. By fixing the stimuli we can use the Jensen-Shannon divergence to measure how reliably one can decide if a given response comes from P{r\r 2 \sfix), or, Pi"d(rir 2 \sfi X ). Comparing the JS for pairs of correlated P(r\s) and uncorrected Pi"d(r\s) cells at a fixed stimuli, one can ask how much information you can gain about the effective network between cells.

RESULTS
In order to evaluate how reliably one can decide if a given response on the population coding of sensory information comes from the full distribution, P{r\r 2 \sfix), or from the product of independent distributions from each cell, The experimental methods can be found in [24]. They collected data from single pairs of cells in primary visual cortex. For each neuron, they first determined the preferred orientation and direction. This was done quantitatively by measuring the responses to sinusoidal gratings drifting in different directions, centered on the receptive field as determined by initial mapping [24]. Figure 2 shows the Jensen-Shannon divergence for one single pair of cells in VI [24] using the NSB entropy estimator [29] at different time windows. The Jensen Shannon divergence was calculated taking the full orientation dependency on both probabilities distributions. Plateau like behavior can be appreciated in the Jensen-Shannon as time window becomes bigger.
In Figure 3 we present the Jensen-Shannon Fraction for 42 pairs of single neurons VI at fixed stimuli orientation as function of the difference in orientation preferences from the tuning curves obtained by each pair of cells (time window fixed at 5 ms). We define the Jensen-Shannon Fraction as the maximum JS divergence obtained by fixing the stimuli for a given set divided by its minimum for a pair of single neurons. Pairs of cells with similar preferred orientations present a higher JS divergence which those that showed less similarities. This result is agreement with the fact that cells which share similar preferred orientation are more likely to have a higher quantity of common inputs than those that present less similarities. Our finding indicates therefore that the Jensen-Shannon Divergence may be used for characterizing the effective circuitry network in a population of neurons.
In Figure 4, we show the calculation of the information rate for a typical pairs of cells, by plotting the information at wordlength L divided by the total time window considered. The asymptote defines the information rate ( [35]). The figure compares information versus L behaviour for binwidths of 5 ms, as used in the manuscript ; note that it was not feasible to go beyond L = 6 (with pairs of cells, each of which can fire many spikes in an individual time bin), as the dimensionality explodes rapidly. Our use of 5 ms is clear: coarser temporal binning leads to a large decrease in the information rate achieved (see [36] for a review). Hence, we use here a 5 ms time windows as a relevant timescale for sensory processing.

CONCLUSION
We performed NSB method on a relative Jensen-Shannon Divergence for pairs of cells in VI with stimuli dependence in orientation, considering the relevant timescale for sensory information. Our findings show that pairs of cells in VI with similar preferred orientations present a higher relative JS divergence which those that have less similarities. The use of a relative Jensen-Shannon Divergence at a fixed stimuli provide us a measure of how much information one can gain about the effective network between cells.