Abstract
A recent report estimates that the number of simultaneously recorded neurons is growing exponentially. A commonly employed statistical paradigm using discrete-time point process models of neural activity involves the computation of a maximum-likelihood estimate. The time to computate this estimate, per neuron, is proportional to the number of bins in a finely spaced discretization of time. By using continuous-time models of neural activity and the optimally efficient Gaussian quadrature, memory requirements and computation times are dramatically decreased in the commonly encountered situation where the number of parameters p is much less than the number of time-bins n. In this regime, with q equal to the quadrature order, memory requirements are decreased from O(n p) to O(q p), and the number of floating-point operations are decreased from O(n p 2) to O(q p 2). Accuracy of the proposed estimates is assessed based upon physiological consideration, error bounds, and mathematical results describing the relation between numerical integration error and numerical error affecting both parameter estimates and the observed Fisher information. A check is provided which is used to adapt the order of numerical integration. The procedure is verified in simulation and for hippocampal recordings. It is found that in 95 % of hippocampal recordings a q of 60 yields numerical error negligible with respect to parameter estimate standard error. Statistical inference using the proposed methodology is a fast and convenient alternative to statistical inference performed using a discrete-time point process model of neural activity. It enables the employment of the statistical methodology available with discrete-time inference, but is faster, uses less memory, and avoids any error due to discretization.
Similar content being viewed by others
Notes
Gaussian quadrature is a method of numerical integration first developed by Gauss. It is optimal for integrating polynomials (Davis and Rabinowitz 1967).
Except for Clenshaw-Curtis quadrature. This quadrature scheme is a factor of 2 less computationally efficient at low quadrature orders q, and becomes computationally as efficient as q increases (Trefethen 2008).
This is assuming that the Hessian has no special structure. When the Hessian has special structure the required FLOPS to compute \(\hat {\beta }^{(i)}\) can, depending on the specific nature of the structure, be reduced from O(q p 2). The structure of the Hessian depends upon the model of the conditional intensity and cannot, in general, be guaranteed. Thus, while there may exist interesting circumstances where the required O(q p 2) FLOPS is reduced, emphasis in this work is on the more general setting.
Other fast methods of iteratively updating \(\hat {\beta }^{(i)}\) exist that can be effective (Shewchuk 1994). For e.g. in Shewchuk (1994), conjugate gradient iteration is reported to require \(O(m\sqrt {\kappa })\) operations for solving the problem, A x=b. Here the matrix A possesses m non-zero entries and has a condition number κ. Note that m in the context considered in this work, is equal to q p for the continuous-time case, and n p for the discrete-time case. Thus, without extra assumptions conjugate gradient also requires either O(q p 2) or O(q n 2) FLOPS.
The parameter α is specified such that the time for the conditional intensity to be near zero following an action potential is .1 m s, .2 m s, and .3 m s.
The multiplication of a polynomial of order x with a polynomial of order y results in a polynomial of order x+y.
The results in this paper are computed using x equal to 7. When x is increased there are three effects. The first is that deviations of \(\hat {\lambda }_q^{(2q)}\) from zero are more accurately approximated by the expansion, Eq. (48). Second, the computation time of the expansion increases (though slightly when using a single trial). And third, Eq. (59) becomes more numerically unstable involving, in a sum, more terms over larger scales.
This scaling is included to reduce rounding errors incurred when adding quantities of greatly differing scales.
As mentioned in the introduction, in the situation where a minimum physiologically meaningful scale is known, \(\left ({\boldsymbol {\sigma }}_q \right )_k\) can be replaced by this value.
Using the MathWorks MATLAB glmfit() command to compute the discrete-time parameter estimates. This function uses a version of iterated re-weighted least squares. All computations in this work are performed using a laptop equipped with an Intel P8600 Core2-Duo processor running at 2.4 GHz.
No quadrature error in the case where q equal to 100 is sufficient to exactly integrate the conditional intensity.
References
Barbieri, R., Frank, L.M., Nguyen, D.P., Quirk, M.C., Solo, V., Wilson, M.A., & Brown, E.N. (2004). Dynamic analyses of information encoding in neural ensembles. Neural Computation, 16(2), 277–307.
Brown, E., Barbieri, R., Ventura, V., Kass, R., & Frank, L. (2002). The time-rescaling theorem and its application to neural spike train data analysis. Neural computation, 14(2), 325–346.
Citi, L., Ba, D., Brown, E.N., & Barbieri, R. (2014). Likelihood methods for point processes with refractoriness. Neural Computation, 26(2), 237–263.
Daley, D.J., & Vere-Jones, D. (2003). An introduction to the theory of point processes: Springer Series in Statistics.
Davis, P.J., & Rabinowitz, P. (1967). Numerical integration: Blaisdell Publishing Company London.
Genz, A., & Kass, R.E. (1991). An application of subregion adaptive numerical integration to a bayesian inference problem. Computing Science and Statistics, 23, 441–444.
Genz, A., & Kass, R.E. (1997). Subregion-adaptive integration of functions having a dominant peak. Journal of Computational and Graphical Statistics, 6(1), 92–111.
Golub, G.H., & Welsch, J.H. (1969). Calculation of gauss quadrature rules. Mathematics of Computation, 23 (106), 221–230+s1–s10.
Golub, G.H., & Welsch, J.H. (1969). Calculation of gauss quadrature rules. Mathematics of Computation, 23 (106), 221–230.
Hale, N., & Townsend, A. (2013). Fast and accurate computation of gauss–legendre and gauss–jacobi quadrature nodes and weights. SIAM Journal on Scientific Computing, 35(2), A652—A674.
Henze, D., & Buzsaki, G. (2001). Action potential threshold of hippocampal pyramidal cells in vivo is increased by recent spiking activity. Neuroscience, 105(1), 121–130.
Kass, R.E., Ventura, V., & Cai, C. (2003). Statistical smoothing of neuronal data. Network-Computation in Neural Systems, 14(1), 5–16.
Kesner, R.P., Hunsaker, M.R., & Gilbert, P.E. (2005). The role of ca1 in the acquisition of an object-trace-odor paired associate task. Behavioral Neuroscience, 119(3), 781–786.
Kuonen, D. (2003). Numerical integration in s-plus or r: A survey. Journal of Statistical Software, 8(13), 1–14.
Lepage, K.Q., Gregoriou, G.G., Kramer, M.A., Aoi, M., Gotts, S.J., Eden, U.T., & Desimone, R. (2013). A procedure for testing across-condition rhythmic spike-field association change. Journal of neuroscience methods, 213(1), 43–62.
Lepage, K.Q., MacDonald, C.J., Eichenbaum, H., & Eden, U.T. (2012). The statistical analysis of partially confounded covariates important to neural spiking. Journal of neuroscience methods, 205(2), 295–304.
MacDonald, C., Lepage, K., Eden, U., & Eichenbaum, H. (2011). Hippocampal “time cells” bridge the gap in memory for discontiguous events. Neuron, 71(4).
McCullagh, P., & Nelder, J.A. (1999). Generalized Linear Models, 2nd: Chapman & Hall/CRC.
Mena, G., & Paninski, L. (2014). On quadrature methods for refractory point process likelihoods: Neural Computation. In press.
Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15(4), 243–262.
Paninski, L., Ahmadian, Y., Ferreira, D.G., Koyama, S., Rad, K.R., Vidne, M., Vogelstein, J., & Wu, W. (2010). A new look at state-space models for neural data. Journal of Computational Neuroscience, 29(1-2), 107–126.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (1992). Numerical recipes in C (2nd ed.): the art of scientific computing. NY, USA: Cambridge University Press.
Ramirez, A.A., & Paninski, L. (2013). Fast generalized linear model estimation via expected log-likelihoods: Journal of Computational Neuroscience, In press.
Shewchuk, J.R. (1994). An introduction to the conjugate gradient method without the agonizing pain.
Snyder, D.L. (1975). Random point processes.
Stevenson, I.H., & Kording, K.P. (2011). How advances in neural recording affect data analysis. Nature neuroscience, 14(2), 139–142.
Stoer, J., & Bulirsch, R. (2002). Introduction to numerical analysis, 3rd, Vol. 12: Springer.
Trefethen, L.N. (2008). Is gauss quadrature better than clenshaw-curtis SIAM Review, 50(1), 67–87.
Truccolo, W., Eden, U.T., Fellows, M.R., Donoghue, J.P., & Brown, E.N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal Neurophysiology, 93(2), 1074–1089.
Acknowledgments
The authors thank Howard Eichenbaum for his support. Thanks goes to Robert E. Kass for a discussion regarding the content of this paper and on the use of Gaussian quadrature in statistics, to Mikio Aoi for a useful comment regarding the scope of the paper, and to Sujith Vijayan for a useful discussion regarding the neural action potential and refractory effect. KQL is supported by NSF grant DMS-1042134.
Conflict of interests
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
Action Editor: Liam Paninski
Appendices
Appendix A: higher-order derivatives of the refractory model
Assume D j (t) specified in Eq. (23) is valid. Then, for j>0,
since \(g^{\prime }_k(t) = -k \alpha e^{\alpha t} g_{k+1}(t)\). Collecting terms,
The induction argument is completed by verifying Eq. (23) for j=1,2,3 by direct calculation.
Appendix B: Non-orderly discrete-time “Gaussian quadrature” process
Gaussian quadrature might be used to compute the MLE associated with a discrete-time process. Based upon the Gaussian quadrature nodes t j , j=1,…,q, consider the increment process, Y j =N(t j )−N(t j−1). Unlike in the previous discussion of a discrete-time point process, here the duration, Δ j , of the j th increment is not constant, but rather equals t j −t j−1. This duration may be relatively large, and the orderliness property of the process is not guaranteed. For a given sample path of the process, let the number of observed counts in the j th increment be y j . The associated log-likelihood, ℓ y , can be shown to equal,
While Eq. (40) may approach the approximation Eq. (11) to the continuous-time log-likelihood Eq. (4) in certain situations, in general these approximations differ. To what extent and under what conditions equivalent inference can be conducted with either of the spike train models are questions appropriate for a future study.
Appendix C: Approximate 2q th derivative of the conditional intensity
An approximation of \(\hat {\lambda }_q^{(2q)}\) is provided in the following derivation. Begin by expanding \(\hat {\lambda }_q(t | \hat {\beta }_q)\) in terms of the order \(q^{\prime } = 2q + x\) Legendre polynomials, where x is a user defined quantity specified in Section 6. The choice of x is discussed in Footnote 5 (Section 6). That is, compute the coefficients c j , \(j = 0, \ \ldots , \ q^{\prime }-1\), using order \(q^{\prime }\) Gaussian quadrature:
Here j indexes the Legendre polynomials, while \(j^{\prime }\) indexes the roots, \(t_{j^{\prime }}\), of the \(q^{\prime th}\) order Legendre polynomial, \(L_{q^{\prime }}\). Then,
For j>2q−1, c j L j (t) is a polynomial that may not be exactly integrated by Gaussian quadrature of order q.
A direct computation of \(\hat {\lambda }_q^{(2q)}\) from Eq. (42) is often inaccurate with standard double-precision floating point numbers. The sum in Eq. (42) often involves the sum of terms that span more than sixteen orders of magnitude for the case where q is equal to 40. When this occurs, numerical error results, and often leads to unacceptable inaccuracy. The problem is mitigated by deriving an alternate expression for \(\hat {\lambda }_q^{(2q)}\). Proceed by Taylor expanding \(\hat {\lambda }_q(t|\hat {\beta })_q\) about t=0 to order 2q+x:
From Eq. (43) the \(u^{\prime th}\) derivative is
the fact that \(\hat {\lambda }_q(t)\) is a polynomial is used to set the lower bound in the sum. From Eq. (42) the \(u^{\prime th}\) derivative evaluated at t=0, is,
Here \(L_j^{(u^{\prime })}\) is the \(u^{\prime th}\) derivative of the order j Legendre polynomial. Because it can be shown, beginning with Rodriguez’s formula, that \(L_n^{(q)}\) is equal to,
\(L_n^{(q)}(0)\) can ben determined analytically. At t=0 in Eq. (46), only \(\ell + \ell ^{\prime } = 0\) terms contribute:
Equations (44), (45) and (47) combine to produce,
Continuing,
The terms in the last sum in Eq. (49) are symmetrical about the center index (j−u even), or about the center indices (j−u odd). For example, the first and the last terms are equal. When j−u is odd, the signs alternate for terms that are identical in absolute-value and the sum is equal to zero. When j−u is even, there are an odd number of terms in the sum. Consider the case, j−u equal to four:
Because j−u equals 4, the last term in Eq. (50) is,
which is identical to the first term in the sum. The terms in the sum are similar in absolute value (and are very small). The maximum-absolute contributing term is the center term. This term is equal to
Due to cancellation, it is useful to consider the approximation:
Substituting Eq. (52) into Eq. (49) results in
where χ j−u e v e n is zero if j−u is an odd integer and is 1 if j−u is an even-valued integer. Using Stirling’s approximate formula for \(\ln x!\):
some cancellations in Eq. (53) can be made. Specifically,
After cancellations, Equation (55) results in,
Substituting Eq. (56) into Eq. (53) yields another approximation for \(\hat {\lambda }_q^{(2q)}\),
The sums in Eq. (57) can be rearranged such that j progresses from 0 to x, and u progresses from 0 to j. Let \(u^{\prime }\) equal j−u. Then,
with the binomial coefficient \(\binom {a}{b} = \frac {a!}{(a-b)! \ b!}\). The coefficients multiplying the powers of t can be collected. This results in the following representation:
for g j specified according to Eq. (58).
Appendix D: Basic derivation of Gaussian quadrature
The polynomial, L j , satisfies for \(j^{\prime } \neq j\),
and is a Legendre polynomial. Gauss quadrature with the Legendre polynomials is sometimes referred to as “Gauss-Legendre” quadrature. The weights, w j , j=1, …, q, are chosen such that the vector, \(\textbf {w} = \left [ w_1 \ {\ldots } \ w_q \right ]^T\), is orthogonal to the q−1 vectors,
for k=1, 2, … ,q−1, with the further stipulation that
Here p 0 is a vector with identical entries equal to the constant L 0, and t j , j=1, 2, … ,q are chosen as the roots of L q :
Thus specified, Equation (8) is exact when λ is a polynomial of order less than or equal to 2q−1. To see this consider the integral of the order q polynomial z 2q−1. Following Stoer and Bulirsch (2002), this polynomial can be expressed as,
where \(\tilde {q}(t)\) and r(t) are linear combinations of L k (t), k<q. Then,
Similarly,
Here L q (t j ) is zero due to Eq. (63), and the orthogonality property of w is exploited. For further details see (Stoer and Bulirsch 2002) & (Press et al. 1992, §4.5).
Specification of the order of integration, q, the roots t j and the weights w j , j=1, … ,q completely specifies the Gaussian quadrature rule, Eq. (8). For the integrals approximated in this work, the t j and weights are computed using the method specified in Golub and Welsch (1969a) for the domain of integration (−1,1). All integrals are transformed to this domain for approximation. The nodes t j , and the weights, w j , can be computed in a number of ways. See (Hale and Townsend 2013) for a fast alternative method capable of accurately determining nodes and weights for Gaussian quadrature orders exceeding 100.
Appendix E: Well-behaved deviation
In the following, a sequence of lemmas are provided establishing the sense in which small quadrature error leads to small numerical error in the parameter estimates. Discussion is restricted to the univariate case (p=1), without loss of generality. Let ℓ q (β) be the log-likelihood computed with the q th order Gaussian quadrature and evaluated at the parameter value β.
Lemma 1
Concavity of ℓ q The second derivative of ℓ q (β) is negative for all \(\beta \in \mathbb {R}\).
Proof
The proof follows from calculation:
for a linear differentiable function f. The weights w j are non-negative (they are squared quantities, see for e.g. (Golub and Welsch 1969b)) guaranteeing the sign of the second derivative for \(\beta \in \mathbb {R}\). □
Let \(\hat {\beta }_q\) be the approximate maximum-likelihood estimate:
Let \(\zeta , \zeta ^{\prime } \in (a,b)\) such that (9) evaluated for \(\hat {\beta }\) and \(\hat {\beta }_q\), is, respectively, \(\delta _{\hat {\beta }}\), and \(\delta _{\hat {\beta }_q}\):
Then there exists a δ for any quadrature order q,
such that
The following lemma can be proven.
Lemma 2
Log-Likelihood Approximation
Proof
Suppose, \(\ell _q(\hat {\beta }_q)\) is less than \(\ell (\hat {\beta }_q)\). By Eq. (72)
From Eq. (74) and concavity the smallest that \(\ell (\hat {\beta })\) can be is \(\ell (\hat {\beta }_q)\). Then \(\ell _q(\hat {\beta }_q) - \ell (\hat \beta ) < \delta \). Similarly, by concavity, \(\ell _q(\hat {\beta } ) + \delta \) is less than ℓ q (β q )+δ. Then by (72)\(\ell (\hat {\beta })\) is upper-bounded:
again implying \(\ell (\hat {\beta }) - \ell _q(\hat {\beta }_q) \leq \delta \).
If instead \(\ell _q(\hat {\beta }_q) > \ell (\hat {\beta }_q)\), then
By concavity we have,
From Eq. (72)\(| \ell _q(\hat {\beta }) - \ell (\hat {\beta }) | < \delta \) implying \(| \ell _q(\hat {\beta }_q) - \ell (\hat {\beta })| < \delta \), and the proof is complete. □
Having established the proximity between the log-likelihood at \(\hat {\beta }\) with the approximate log-likelihood at \(\hat {\beta }_{q} \), it remains to show that \(\hat {\beta }_{q} \) approximates \(\hat {\beta }\).
Lemma 3
\(\hat {\beta }_{q} \) approximates \(\hat {\beta }\)
Fix δ>0. If
then there exists an 𝜖>0,
such that
Proof
Taylor expanding ℓ q about \(\hat {\beta }_{q} \) and evaluating at \(\hat {\beta }\) yields:
with \(\eta \in \left (\hat {\beta }_{q} , \hat {\beta } \right )\). By the triangle inequality,
Then:
with \(\epsilon = \hat {\beta } - \hat {\beta }_q\). □
Appendix F: Maximum parameter estimate error
The δ in Lemma 3 is the larger of the two quadrature errors, \(\delta _{\hat {\beta }_q}\) and \(\delta _{\hat {\beta }}\); the former computed for the known \(\hat {\beta }_q\), and the other for the unknowable \(\hat {\beta }\). If a bound is placed upon \(\lambda ^{(2q)} / k^2_q\), and the limit taken as q tends to infinity, both of these error bounds tend to zero, and hence become close. Here, to obtain an estimate of 𝜖 it is assumed that \(\delta = \delta _{\hat {\beta }_q} = \delta _{\hat {\beta }}\). With this specification, 𝜖, the parameter estimate deviation from the true MLE can be specified. From Eq. (9), set
Then, for η as specified after Eq. (81),
With this specification,
It is useful to set 𝜖 2 equal to the bound in Eq. (86). Let
and
Then,
Appendix G: Accuracy of observed fisher information
Lemma 4
𝜖-Equivalence of Observed Fisher Information
Let 𝜖 be as specified in Eq. ( 89 ). Introduce the unbiased estimators \(\tilde {\beta }\) , \(\tilde {\beta }_q\) , whose realizations are the estimates \(\hat {\beta }\) and \(\hat {\beta }_q\) of the parameters β and β q . Further, let the realization of the random variable X be the quadrature error for any given data set, and specify X to be independent of the true MLE estimator \(\tilde {\beta }\) . Then the variance, \(var\left \{ \tilde {\beta }_q\right \}\) satisfies:
Proof
The proof follows from direct calculation. Consider
For some realized quadrature error η, \(\left | \eta \right | < \epsilon \), the term,
Then
Similarly the contribution to Eq. (91) from the terms involving X can be bounded:
Rights and permissions
About this article
Cite this article
Lepage, K.Q., MacDonald, C.J. Fast maximum likelihood estimation using continuous-time neural point process models. J Comput Neurosci 38, 499–519 (2015). https://doi.org/10.1007/s10827-015-0551-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10827-015-0551-y