Log in

Fast maximum likelihood estimation using continuous-time neural point process models

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

A recent report estimates that the number of simultaneously recorded neurons is growing exponentially. A commonly employed statistical paradigm using discrete-time point process models of neural activity involves the computation of a maximum-likelihood estimate. The time to computate this estimate, per neuron, is proportional to the number of bins in a finely spaced discretization of time. By using continuous-time models of neural activity and the optimally efficient Gaussian quadrature, memory requirements and computation times are dramatically decreased in the commonly encountered situation where the number of parameters p is much less than the number of time-bins n. In this regime, with q equal to the quadrature order, memory requirements are decreased from O(n p) to O(q p), and the number of floating-point operations are decreased from O(n p 2) to O(q p 2). Accuracy of the proposed estimates is assessed based upon physiological consideration, error bounds, and mathematical results describing the relation between numerical integration error and numerical error affecting both parameter estimates and the observed Fisher information. A check is provided which is used to adapt the order of numerical integration. The procedure is verified in simulation and for hippocampal recordings. It is found that in 95 % of hippocampal recordings a q of 60 yields numerical error negligible with respect to parameter estimate standard error. Statistical inference using the proposed methodology is a fast and convenient alternative to statistical inference performed using a discrete-time point process model of neural activity. It enables the employment of the statistical methodology available with discrete-time inference, but is faster, uses less memory, and avoids any error due to discretization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Gaussian quadrature is a method of numerical integration first developed by Gauss. It is optimal for integrating polynomials (Davis and Rabinowitz 1967).

  2. Except for Clenshaw-Curtis quadrature. This quadrature scheme is a factor of 2 less computationally efficient at low quadrature orders q, and becomes computationally as efficient as q increases (Trefethen 2008).

  3. This is assuming that the Hessian has no special structure. When the Hessian has special structure the required FLOPS to compute \(\hat {\beta }^{(i)}\) can, depending on the specific nature of the structure, be reduced from O(q p 2). The structure of the Hessian depends upon the model of the conditional intensity and cannot, in general, be guaranteed. Thus, while there may exist interesting circumstances where the required O(q p 2) FLOPS is reduced, emphasis in this work is on the more general setting.

  4. Other fast methods of iteratively updating \(\hat {\beta }^{(i)}\) exist that can be effective (Shewchuk 1994). For e.g. in Shewchuk (1994), conjugate gradient iteration is reported to require \(O(m\sqrt {\kappa })\) operations for solving the problem, A x=b. Here the matrix A possesses m non-zero entries and has a condition number κ. Note that m in the context considered in this work, is equal to q p for the continuous-time case, and n p for the discrete-time case. Thus, without extra assumptions conjugate gradient also requires either O(q p 2) or O(q n 2) FLOPS.

  5. The parameter α is specified such that the time for the conditional intensity to be near zero following an action potential is .1 m s, .2 m s, and .3 m s.

  6. The multiplication of a polynomial of order x with a polynomial of order y results in a polynomial of order x+y.

  7. The results in this paper are computed using x equal to 7. When x is increased there are three effects. The first is that deviations of \(\hat {\lambda }_q^{(2q)}\) from zero are more accurately approximated by the expansion, Eq. (48). Second, the computation time of the expansion increases (though slightly when using a single trial). And third, Eq. (59) becomes more numerically unstable involving, in a sum, more terms over larger scales.

  8. This scaling is included to reduce rounding errors incurred when adding quantities of greatly differing scales.

  9. As mentioned in the introduction, in the situation where a minimum physiologically meaningful scale is known, \(\left ({\boldsymbol {\sigma }}_q \right )_k\) can be replaced by this value.

  10. Using the MathWorks MATLAB glmfit() command to compute the discrete-time parameter estimates. This function uses a version of iterated re-weighted least squares. All computations in this work are performed using a laptop equipped with an Intel P8600 Core2-Duo processor running at 2.4 GHz.

  11. No quadrature error in the case where q equal to 100 is sufficient to exactly integrate the conditional intensity.

References

  • Barbieri, R., Frank, L.M., Nguyen, D.P., Quirk, M.C., Solo, V., Wilson, M.A., & Brown, E.N. (2004). Dynamic analyses of information encoding in neural ensembles. Neural Computation, 16(2), 277–307.

    Article  PubMed  Google Scholar 

  • Brown, E., Barbieri, R., Ventura, V., Kass, R., & Frank, L. (2002). The time-rescaling theorem and its application to neural spike train data analysis. Neural computation, 14(2), 325–346.

    Article  PubMed  Google Scholar 

  • Citi, L., Ba, D., Brown, E.N., & Barbieri, R. (2014). Likelihood methods for point processes with refractoriness. Neural Computation, 26(2), 237–263.

    Article  PubMed  Google Scholar 

  • Daley, D.J., & Vere-Jones, D. (2003). An introduction to the theory of point processes: Springer Series in Statistics.

  • Davis, P.J., & Rabinowitz, P. (1967). Numerical integration: Blaisdell Publishing Company London.

  • Genz, A., & Kass, R.E. (1991). An application of subregion adaptive numerical integration to a bayesian inference problem. Computing Science and Statistics, 23, 441–444.

    Google Scholar 

  • Genz, A., & Kass, R.E. (1997). Subregion-adaptive integration of functions having a dominant peak. Journal of Computational and Graphical Statistics, 6(1), 92–111.

    Google Scholar 

  • Golub, G.H., & Welsch, J.H. (1969). Calculation of gauss quadrature rules. Mathematics of Computation, 23 (106), 221–230+s1–s10.

    Article  Google Scholar 

  • Golub, G.H., & Welsch, J.H. (1969). Calculation of gauss quadrature rules. Mathematics of Computation, 23 (106), 221–230.

    Article  Google Scholar 

  • Hale, N., & Townsend, A. (2013). Fast and accurate computation of gauss–legendre and gauss–jacobi quadrature nodes and weights. SIAM Journal on Scientific Computing, 35(2), A652—A674.

    Article  Google Scholar 

  • Henze, D., & Buzsaki, G. (2001). Action potential threshold of hippocampal pyramidal cells in vivo is increased by recent spiking activity. Neuroscience, 105(1), 121–130.

    Article  CAS  PubMed  Google Scholar 

  • Kass, R.E., Ventura, V., & Cai, C. (2003). Statistical smoothing of neuronal data. Network-Computation in Neural Systems, 14(1), 5–16.

    Article  Google Scholar 

  • Kesner, R.P., Hunsaker, M.R., & Gilbert, P.E. (2005). The role of ca1 in the acquisition of an object-trace-odor paired associate task. Behavioral Neuroscience, 119(3), 781–786.

    Article  PubMed  Google Scholar 

  • Kuonen, D. (2003). Numerical integration in s-plus or r: A survey. Journal of Statistical Software, 8(13), 1–14.

    Google Scholar 

  • Lepage, K.Q., Gregoriou, G.G., Kramer, M.A., Aoi, M., Gotts, S.J., Eden, U.T., & Desimone, R. (2013). A procedure for testing across-condition rhythmic spike-field association change. Journal of neuroscience methods, 213(1), 43–62.

    Article  PubMed Central  PubMed  Google Scholar 

  • Lepage, K.Q., MacDonald, C.J., Eichenbaum, H., & Eden, U.T. (2012). The statistical analysis of partially confounded covariates important to neural spiking. Journal of neuroscience methods, 205(2), 295–304.

    Article  PubMed Central  PubMed  Google Scholar 

  • MacDonald, C., Lepage, K., Eden, U., & Eichenbaum, H. (2011). Hippocampal “time cells” bridge the gap in memory for discontiguous events. Neuron, 71(4).

  • McCullagh, P., & Nelder, J.A. (1999). Generalized Linear Models, 2nd: Chapman & Hall/CRC.

  • Mena, G., & Paninski, L. (2014). On quadrature methods for refractory point process likelihoods: Neural Computation. In press.

  • Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15(4), 243–262.

    Article  Google Scholar 

  • Paninski, L., Ahmadian, Y., Ferreira, D.G., Koyama, S., Rad, K.R., Vidne, M., Vogelstein, J., & Wu, W. (2010). A new look at state-space models for neural data. Journal of Computational Neuroscience, 29(1-2), 107–126.

    Article  PubMed Central  PubMed  Google Scholar 

  • Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (1992). Numerical recipes in C (2nd ed.): the art of scientific computing. NY, USA: Cambridge University Press.

    Google Scholar 

  • Ramirez, A.A., & Paninski, L. (2013). Fast generalized linear model estimation via expected log-likelihoods: Journal of Computational Neuroscience, In press.

  • Shewchuk, J.R. (1994). An introduction to the conjugate gradient method without the agonizing pain.

  • Snyder, D.L. (1975). Random point processes.

  • Stevenson, I.H., & Kording, K.P. (2011). How advances in neural recording affect data analysis. Nature neuroscience, 14(2), 139–142.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Stoer, J., & Bulirsch, R. (2002). Introduction to numerical analysis, 3rd, Vol. 12: Springer.

  • Trefethen, L.N. (2008). Is gauss quadrature better than clenshaw-curtis SIAM Review, 50(1), 67–87.

    Article  Google Scholar 

  • Truccolo, W., Eden, U.T., Fellows, M.R., Donoghue, J.P., & Brown, E.N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal Neurophysiology, 93(2), 1074–1089.

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Howard Eichenbaum for his support. Thanks goes to Robert E. Kass for a discussion regarding the content of this paper and on the use of Gaussian quadrature in statistics, to Mikio Aoi for a useful comment regarding the scope of the paper, and to Sujith Vijayan for a useful discussion regarding the neural action potential and refractory effect. KQL is supported by NSF grant DMS-1042134.

Conflict of interests

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyle Q. Lepage.

Additional information

Action Editor: Liam Paninski

Appendices

Appendix A: higher-order derivatives of the refractory model

Assume D j (t) specified in Eq. (23) is valid. Then, for j>0,

$$\begin{array}{@{}rcl@{}} D^{\prime}_j(t) &=& \frac{d}{dt}\left\{ 2 \lambda_0 \alpha^{j} e^{\alpha t} \sum\limits_{k=2}^{j+1} \alpha_{j,k} e^{(k-2)\alpha t} g_k(t) \right\} \ , \\ &=& 2 \lambda_0 \alpha^{j+1} e^{\alpha t} \sum\limits_{k=2}^{j+1} \alpha_{j,k} e^{(k-2)\alpha t} g_k(t)\\ &&+ 2 \lambda_0 \alpha^j e^{\alpha t} \sum\limits_{k=2}^{j+1} \alpha_{j,k} (k-2) \alpha e^{(k-2)\alpha t} g_k(t)\\ && + \alpha_{j,k} e^{(k-2)\alpha t} (-k) \alpha e^{\alpha t} g_{k+1}(t) \ , \end{array} $$
(38)

since \(g^{\prime }_k(t) = -k \alpha e^{\alpha t} g_{k+1}(t)\). Collecting terms,

$$\begin{array}{@{}rcl@{}} D^{\prime}_{j}(t) &=&2 \lambda_0 \alpha^{j+1} e^{\alpha t} \sum\limits_{k=2}^{j+1} (k-1) \alpha_{j,k} e^{(k-2) \alpha t} g_k(t)\\ && - k \alpha_{j,k} e^{(k-1) \alpha t} g_{k+1}(t) \ , \\ &=& 2 \lambda_0 \alpha^{j+1} e^{\alpha t} \left[ \sum\limits_{k=2}^{j+1} (k-1) \alpha_{j,k} e^{(k-2) \alpha t} g_k(t)\right.\\ && - \left. \sum\limits_{k^{\prime}= 3}^{j+2} (k^{\prime}-1) \alpha_{j,k^{\prime}-1} e^{(k^{\prime}-2) \alpha t} g_{k^{\prime}}(t) \right] \ , \\ &=& 2 \lambda_0 \alpha^{j+1} e^{\alpha t} \sum\limits_{k=2}^{j+2} \alpha_{j+1,k} e^{(k-2) \alpha t} g_k(t) \ , \\ &=& D_{j+1}(t) \ , \hspace{2.1cm} (j > 0) \ . \end{array} $$
(39)

The induction argument is completed by verifying Eq. (23) for j=1,2,3 by direct calculation.

Appendix B: Non-orderly discrete-time “Gaussian quadrature” process

Gaussian quadrature might be used to compute the MLE associated with a discrete-time process. Based upon the Gaussian quadrature nodes t j , j=1,…,q, consider the increment process, Y j =N(t j )−N(t j−1). Unlike in the previous discussion of a discrete-time point process, here the duration, Δ j , of the j th increment is not constant, but rather equals t j t j−1. This duration may be relatively large, and the orderliness property of the process is not guaranteed. For a given sample path of the process, let the number of observed counts in the j th increment be y j . The associated log-likelihood, y , can be shown to equal,

$$\begin{array}{@{}rcl@{}} \ell_y(\beta) &=& \sum\limits_{j=1}^q y_j \log({\Delta}_j \lambda(t_j|\beta)) - \sum\limits_{j=1}^q {\Delta}_j \lambda(t_j | \beta )\\ &&- \sum\limits_{j=1}^q \log(y_j! ) \ . \end{array} $$
(40)

While Eq. (40) may approach the approximation Eq. (11) to the continuous-time log-likelihood Eq. (4) in certain situations, in general these approximations differ. To what extent and under what conditions equivalent inference can be conducted with either of the spike train models are questions appropriate for a future study.

Appendix C: Approximate 2q th derivative of the conditional intensity

An approximation of \(\hat {\lambda }_q^{(2q)}\) is provided in the following derivation. Begin by expanding \(\hat {\lambda }_q(t | \hat {\beta }_q)\) in terms of the order \(q^{\prime } = 2q + x\) Legendre polynomials, where x is a user defined quantity specified in Section 6. The choice of x is discussed in Footnote 5 (Section 6). That is, compute the coefficients c j , \(j = 0, \ \ldots , \ q^{\prime }-1\), using order \(q^{\prime }\) Gaussian quadrature:

$$\begin{array}{@{}rcl@{}} c_j &=& \left. \sum\limits_{j^{\prime}=0}^{q^{\prime}-1} w_{j^{\prime}} \hat{\lambda}_{q}\left( t_{j^{\prime}} | \hat{\beta}_q \right) L_j(t_{j^{\prime}}) \middle/ \sum\limits_{j^{\prime} =0}^{q^{\prime}-1} w_{j^{\prime}} L_j^2(t_{j^{\prime}} )\ ,\right.\\ &\approx& \left. {\int}_{0}^T \hat\lambda_{q}\left( t | \hat\beta_q \right) L_j(t ) \ dt \middle/ {\int}_0^T L^2_j(t) \ dt \right. \ . \end{array} $$
(41)

Here j indexes the Legendre polynomials, while \(j^{\prime }\) indexes the roots, \(t_{j^{\prime }}\), of the \(q^{\prime th}\) order Legendre polynomial, \(L_{q^{\prime }}\). Then,

$$ \hat{\lambda}_q(t | \hat{\beta}_q) \approx \sum\limits_{j=0}^{q^{\prime}} c_j L_j(t ) \ . $$
(42)

For j>2q−1, c j L j (t) is a polynomial that may not be exactly integrated by Gaussian quadrature of order q.

A direct computation of \(\hat {\lambda }_q^{(2q)}\) from Eq. (42) is often inaccurate with standard double-precision floating point numbers. The sum in Eq. (42) often involves the sum of terms that span more than sixteen orders of magnitude for the case where q is equal to 40. When this occurs, numerical error results, and often leads to unacceptable inaccuracy. The problem is mitigated by deriving an alternate expression for \(\hat {\lambda }_q^{(2q)}\). Proceed by Taylor expanding \(\hat {\lambda }_q(t|\hat {\beta })_q\) about t=0 to order 2q+x:

$$\begin{array}{@{}rcl@{}} \hat{\lambda}_q(t|\hat{\beta}_q) &=& \sum\limits_{u^{\prime}=0}^{2q+x} \frac{\hat{\lambda}_q^{(u^{\prime})}(0|\hat{\beta}_q )}{u^{\prime}!} t^{u^{\prime}} \ . \end{array} $$
(43)

From Eq. (43) the \(u^{\prime th}\) derivative is

$$\begin{array}{@{}rcl@{}} \hat{\lambda}_q^{(u^{\prime})}(t \left|\right. \hat{\beta}_q) \left|\right._{t=0} &\approx &\sum\limits_{j^{\prime}=u^{\prime}}^{2q+x} \frac{ j^{\prime}!}{(j^{\prime} - u^{\prime})!}\frac{\hat{\lambda}_q^{(j^{\prime})}(0|\hat{\beta}_q )}{j^{\prime}!} t^{j^{\prime}-u^{\prime}} \ , \\ &=& t^{-u^{\prime}} \sum\limits_{j^{\prime}=u^{\prime}}^{2q+x} \frac{ j^{\prime}!}{(j^{\prime} - u^{\prime})!}\frac{\hat{\lambda}_q^{(j^{\prime})}(0|\hat{\beta}_q )}{j^{\prime}!} t^{j^{\prime}} \ , \end{array} $$
(44)

the fact that \(\hat {\lambda }_q(t)\) is a polynomial is used to set the lower bound in the sum. From Eq. (42) the \(u^{\prime th}\) derivative evaluated at t=0, is,

$$ \hat{\lambda}_q^{(u^{\prime})}(0 | \hat{\beta}_q) \approx \sum\limits_{j=u^{\prime}}^{q^{\prime}-1} c_j L_j^{(u^{\prime})}(0 ) \ . $$
(45)

Here \(L_j^{(u^{\prime })}\) is the \(u^{\prime th}\) derivative of the order j Legendre polynomial. Because it can be shown, beginning with Rodriguez’s formula, that \(L_n^{(q)}\) is equal to,

$$\begin{array}{@{}rcl@{}} L_n^{(q)}(t) &=& \frac{(n+q)! n!}{2^n} \sum\limits_{k=q}^n \sum\limits_{\ell=0}^{k-q} \sum\limits_{\ell^{\prime}=0}^{n-k} -1^{n-k-\ell^{\prime}}\\ && \times\frac{ t^{\ell+\ell^{\prime}}} {(n+q-k)! \ k! \ \ell! \ \ell^{\prime}! \ (k-q-\ell)! \ (n-k-\ell^{\prime})!} \ , \end{array} $$
(46)

\(L_n^{(q)}(0)\) can ben determined analytically. At t=0 in Eq. (46), only \(\ell + \ell ^{\prime } = 0\) terms contribute:

$$\begin{array}{@{}rcl@{}} L_n^{(q)}(0) &=&\! \frac{(n+q)! n!}{2^n} \sum\limits_{k=q}^n \sum\limits_{\ell=0}^{k-q} \sum\limits_{\ell^{\prime}=0}^{n-k} \delta_{\ell,0} \delta_{\ell^{\prime}, 0} -1^{n-k-\ell^{\prime}}\\ && \times\! \frac{1} { (n\! +\! q-k)! k! \ell! \ell^{\prime}! (k-q-\ell)! (n-k-\ell^{\prime})!} \ , \\ &=&\! \frac{(n+q)! n!}{2^n} \sum\limits_{k=q}^n \frac{ -1^{n-k} } { (n+q-k)! k! (k-q)! (n-k)!} \ . \\ \end{array} $$
(47)

Equations (44), (45) and (47) combine to produce,

$$\begin{array}{@{}rcl@{}} \hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) &=& t^{-2q} \sum\limits_{j^{\prime}=2q}^{2q+x} \frac{ j^{\prime}!}{(j^{\prime} - 2q)!}\frac{\hat{\lambda}_q^{(j^{\prime})}(0|\hat{\beta}_q )}{j^{\prime}!} t^{j^{\prime}} \ , \\ &=& t^{-2q} \sum\limits_{u=0}^{x} \frac{1} { u! } \hat{\lambda}_q^{(u+2q)}(0|\hat{\beta}_q) t^{u+2q} \ , \\ &=& \sum\limits_{u=0}^{x} \frac{t^u} { u!} \ \hat{\lambda}_q^{(u+2q)}(0|\hat{\beta}_q) \ , \end{array} $$
(48)

Continuing,

$$\begin{array}{@{}rcl@{}} \lefteqn{\hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) =}&& \\ &=& \sum\limits_{u=0}^{x} \frac{1 } { u!} t^u \sum\limits_{j^{\prime\prime}=u+2q}^{2q+x} c_{j^{\prime\prime}} L_{j^{\prime\prime}}^{(u+2q)}(0) \ , \\ &=& \sum\limits_{u=0}^{x} \frac{1} { u!} t^u \sum\limits_{j=u}^{x} c_{j+2q} L_{j+2q}^{(u+2q)}(0) \ , \\ &=& \sum\limits_{u=0}^{x} \frac{1} { u!} t^u \sum\limits_{j=u}^{x} c_{j+2q} \frac{(j+u+4q)! (j+2q) !}{2^{j+2q}}\\ && \times \sum\limits_{k=u+2q}^{j+2q} \frac{ -1^{j+2q -k} } { (j+u+2q +2q-k)! k! (k-2q-u)! (j+2q -k)!} \\ &=& \sum\limits_{u=0}^{x} \frac{ t^u}{u!} \sum\limits_{j=u}^{x} c_{j+2q} \frac{(j+u+4q)! (j+2q) !}{2^{j+2q}}\\ && \times \sum\limits_{k^{\prime}=0}^{j-u} \frac{-1^{j - u - k^{\prime}} } { (j+ 2q -k^{\prime})!\ (k^{\prime}+u+2q)!\ (k^{\prime} )!\ (j-u-k^{\prime})!} \end{array} $$
(49)

The terms in the last sum in Eq. (49) are symmetrical about the center index (ju even), or about the center indices (ju odd). For example, the first and the last terms are equal. When ju is odd, the signs alternate for terms that are identical in absolute-value and the sum is equal to zero. When ju is even, there are an odd number of terms in the sum. Consider the case, ju equal to four:

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{k^{\prime}=0}^{4} \frac{ -1^{j - u - k^{\prime}} } { (j+ 2q -k^{\prime})!\ (k^{\prime}+u+2q)!\ (k^{\prime} )!\ (j-u-k^{\prime})!}\\ &=& \frac{1}{ (j+2q)!\ (u+2q)!\ 0!\ 4!}\\ &&- \frac{1}{ (j+2q-1)!\ (u+2q+1)!\ 1!\ 3!}\\ &&+ \frac{1}{ (j+2q-2)!\ (u+2q+2)!\ 2!\ 2!}\\ &&- \frac{1}{ (j+2q-3)!\ (u+2q+3)!\ 3!\ 1!}\\ &&+ \frac{1}{ (j+2q-4)!\ (u+2q+4)!\ 4!\ 0!}\ . \end{array} $$
(50)

Because ju equals 4, the last term in Eq. (50) is,

$$ \frac{1}{ (u+2q )!\ (j+2q )!\ 4!\ 0!} \ , \\ $$

which is identical to the first term in the sum. The terms in the sum are similar in absolute value (and are very small). The maximum-absolute contributing term is the center term. This term is equal to

$$ \frac{(-1)^{\frac{j-u}{2}}}{\left[ \left( \frac{j+u}{2} +2q \right)!\right]^2 \ \left[ \left( \frac{j-u}{2} \right)! \right]^2} \ . $$
(51)

Due to cancellation, it is useful to consider the approximation:

$$\begin{array}{@{}rcl@{}} \sum\limits_{k^{\prime}=0}^{j-u} \frac{ -1^{j - u - k^{\prime}} } { (j+ 2q -k^{\prime})!\ (k^{\prime}+u+2q)!\ (k^{\prime} )!\ (j-u-k^{\prime})!} &=& \\ \\ -1^{\frac{j-u}{2}} \left\{ \begin{array}{ccc} 0 & , & j - u \ odd \\ & & \\ \left[\left( \frac{j+u}{2} +2q \right)! \ \left( \frac{j-u}{2} \right)! \right]^{-2} & , & j - u \ even \end{array} \right.\,. \end{array} $$
(52)

Substituting Eq. (52) into Eq. (49) results in

$$\begin{array}{@{}rcl@{}} \hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) \approx && \sum\limits_{u=0}^{x} \frac{ t^u}{u!} \sum\limits_{j=u}^{x} c_{j+2q}\\ && \times \frac{(j+u+4q)! (j+2q) !}{2^{j+2q}} \frac{-1^{\frac{j-u}{2} } \ \ \chi_{j-u\ even}} {\left[\! \left(\! \frac{j+u}{2} +2q \right)! \right]^2 \ \left[\! \left(\! \frac{j-u}{2} \right)! \right]^2}\\ \end{array} $$
(53)

where χ ju e v e n is zero if ju is an odd integer and is 1 if ju is an even-valued integer. Using Stirling’s approximate formula for \(\ln x!\):

$$ \ln x! \approx x \ln x - x \ , $$
(54)

some cancellations in Eq. (53) can be made. Specifically,

$$\begin{array}{@{}rcl@{}} \lefteqn{\ln{\left\{ \frac{(j+u+4q)!} {2^{j+2q}} \frac{1}{\left[ \left( \frac{j+u}{2} +2q \right)! \right]^2 \ \left[ \left( \frac{j-u}{2} \right)! \right]^2} \right\}} \approx}&&\\ &&(j+u+4q) \ln(j+u+4q) - (j+u+4q) \ +\\ && -(j+2q) \ln 2 \ +\\ && -2\left[ \left(\frac{j+u}{2}+2q\right) \ln \left(\frac{j+u}{2}+2q\right) - \left( \frac{j+u}{2}+2q\right) \right] \ +\\ && -2\left[\left(\frac{j-u}{2} \right)\ln\left(\frac{j-u}{2}\right) - \left(\frac{j-u}{2} \right)\right] \ . \\ \end{array} $$
(55)

After cancellations, Equation (55) results in,

$$\begin{array}{@{}rcl@{}} \lefteqn{\frac{(j+u+4q)!} {2^{j+2q}} \frac{1}{\left[ \left( \frac{j+u}{2} +2q \right)! \right]^2 \ \left[ \left( \frac{j-u}{2} \right)! \right]^2} \approx}&& \\ && \frac{2^{j + u + 4q}}{2^{j+2q}} \frac{1}{2^{u-j} (j-u)!} \ , \\ &=&\frac{2^{j+2q} }{ (j-u)!} \ . \end{array} $$
(56)

Substituting Eq. (56) into Eq. (53) yields another approximation for \(\hat {\lambda }_q^{(2q)}\),

$$\begin{array}{@{}rcl@{}} \lefteqn{\hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) \approx} \\ && \sum\limits_{u=0}^{x} \frac{ t^u}{u!} \sum\limits_{j=u}^{x} c_{j+2q} \frac{2^{j+2q} (j+2q)! \ (-1)^{\frac{j-u}{2} } \ \ \chi_{j-u\ even}} { (j-u)!} \ . \end{array} $$
(57)

The sums in Eq. (57) can be rearranged such that j progresses from 0 to x, and u progresses from 0 to j. Let \(u^{\prime }\) equal ju. Then,

$$\begin{array}{@{}rcl@{}} \lefteqn{\hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) \approx} \\ && \sum\limits_{j=0}^x c_{j+2q} 2^{j+2q} (j+2q)! \sum\limits_{u^{\prime}=0}^{j} \frac{ t^{j-u^{\prime}} } { (j-u^{\prime})! \ u^{\prime}!} (-1)^{\frac{u^{\prime}}{2}} \ \chi_{u^{\prime}\ even} \ , \\ &=& 2^{2q} \sum\limits_{j=0}^x c_{j+2q} \frac{(j+2q)! \ (2t)^j}{j!} \sum\limits_{u^{\prime}=0}^{j} \binom{j}{u^{\prime}} t^{-u^{\prime}} \ (-1)^{\frac{u^{\prime}}{2}} \ \chi_{ u^{\prime}\ even} \ , \\ \end{array} $$
(58)

with the binomial coefficient \(\binom {a}{b} = \frac {a!}{(a-b)! \ b!}\). The coefficients multiplying the powers of t can be collected. This results in the following representation:

$$ \hat{\lambda}_q^{(2q)}(t|\hat{\beta}_q) \approx \sum\limits_{j=0}^x g_j t^j \ , $$
(59)

for g j specified according to Eq. (58).

Appendix D: Basic derivation of Gaussian quadrature

The polynomial, L j , satisfies for \(j^{\prime } \neq j\),

$$ {\int}_0^T L_j(t^{\prime} ) \ L_{j^{\prime}}(t^{\prime}) \ dt^{\prime} = 0 \ , $$
(60)

and is a Legendre polynomial. Gauss quadrature with the Legendre polynomials is sometimes referred to as “Gauss-Legendre” quadrature. The weights, w j , j=1, …, q, are chosen such that the vector, \(\textbf {w} = \left [ w_1 \ {\ldots } \ w_q \right ]^T\), is orthogonal to the q−1 vectors,

$$ \textbf{p}_k = \left[ L_k(t_1 ) \ {\ldots} \ L_k(t_q ) \right]^T \ , $$
(61)

for k=1, 2, … ,q−1, with the further stipulation that

$$ \textbf{w}^T \textbf{p}_0 = {\int}_0^T L_0(t^{\prime} ) \ dt^{\prime} \ . $$
(62)

Here p 0 is a vector with identical entries equal to the constant L 0, and t j , j=1, 2, … ,q are chosen as the roots of L q :

$$ L_q(t_j ) = 0 \ , j = 1, \ {\ldots} \ , q \ . $$
(63)

Thus specified, Equation (8) is exact when λ is a polynomial of order less than or equal to 2q−1. To see this consider the integral of the order q polynomial z 2q−1. Following Stoer and Bulirsch (2002), this polynomial can be expressed as,

$$ z_{2q-1}(t ) = L_q(t) \ \tilde{q}(t) + r(t) \ , $$
(64)

where \(\tilde {q}(t)\) and r(t) are linear combinations of L k (t), k<q. Then,

$$\begin{array}{@{}rcl@{}} {\int}_0^T z_{2q-1}(t^{\prime}) \ dt\ &=& {\int}_0^T L_q(t^{\prime}) \ \tilde{q}(t^{\prime}) + r(t^{\prime}) \ dt^{\prime} \ , \\ &=& {\int}_0^T r(t^{\prime}) \ dt^{\prime} \ , \qquad\qquad\qquad\qquad\quad\hspace*{3pt} (A1) \\ &=& \sum\limits_{k=0}^{q-1} \beta_k {\int}_0^T L_k(t^{\prime} ) \ dt^{\prime} \ , \qquad\quad (Def.\ of\ r(t)) \\ &=& \beta_0 {\int}_0^T L_0(t^{\prime} ) \ dt^{\prime} \ . \ \qquad\qquad\quad\hspace*{5pt} (\perp\ w.\ L_0)\\ \end{array} $$
(65)

Similarly,

$$\begin{array}{@{}rcl@{}} \sum\limits_{j=1}^q w_j \ z_{2q-1}(t_j) &=& \sum\limits_{j=1}^q w_j \left( L_q(t_j) \ \tilde{q}(t_j) + r(t_j) \right) \ , \\ &=& \sum\limits_{j=1}^q w_j \ r(t_j ) \ , \hspace{2cm} (L_q(t_j) = 0) \\ &=& \sum\limits_{k=0}^{q-1} \beta_k \sum\limits_{j=1}^q w_j \ L_k(t_j ) \ , \hspace{2cm} (\perp) \\ & =& \beta_0 {\int}_0^T L_0(t^{\prime} ) \ dt^{\prime} \ . \end{array} $$
(66)

Here L q (t j ) is zero due to Eq. (63), and the orthogonality property of w is exploited. For further details see (Stoer and Bulirsch 2002) & (Press et al. 1992, §4.5).

Specification of the order of integration, q, the roots t j and the weights w j , j=1, … ,q completely specifies the Gaussian quadrature rule, Eq. (8). For the integrals approximated in this work, the t j and weights are computed using the method specified in Golub and Welsch (1969a) for the domain of integration (−1,1). All integrals are transformed to this domain for approximation. The nodes t j , and the weights, w j , can be computed in a number of ways. See (Hale and Townsend 2013) for a fast alternative method capable of accurately determining nodes and weights for Gaussian quadrature orders exceeding 100.

Appendix E: Well-behaved deviation

In the following, a sequence of lemmas are provided establishing the sense in which small quadrature error leads to small numerical error in the parameter estimates. Discussion is restricted to the univariate case (p=1), without loss of generality. Let q (β) be the log-likelihood computed with the q th order Gaussian quadrature and evaluated at the parameter value β.

Lemma 1

Concavity of ℓ q The second derivative of ℓ q (β) is negative for all \(\beta \in \mathbb {R}\).

Proof

The proof follows from calculation:

$$\begin{array}{@{}rcl@{}} \frac{d^2 \ell_q(\beta)}{d\beta^2} &=&\frac{d^2}{d\beta^2} \left\{ \sum\limits_{t \in T_s} \log{(\lambda(t \ | \ \beta ))}\! -\! \sum\limits_{j=1}^q w_j \lambda(t_j \ | \ \beta ) \right\} \ , \\ &=& -\sum\limits_{j=1}^q w_j \frac{d^2\lambda(t_j \ | \ \beta )}{d\beta^2} \ , \\ &=& -\sum\limits_{j=1}^q w_j \left[ f^{\prime}(\beta) \right]^2 e^{f(\beta)} \ , \\ \end{array} $$
(67)

for a linear differentiable function f. The weights w j are non-negative (they are squared quantities, see for e.g. (Golub and Welsch 1969b)) guaranteeing the sign of the second derivative for \(\beta \in \mathbb {R}\). □

Let \(\hat {\beta }_q\) be the approximate maximum-likelihood estimate:

$$ \hat{\beta}_q = \arg\max\limits_{\beta} \ \ell_q(\beta) \ . $$
(68)

Let \(\zeta , \zeta ^{\prime } \in (a,b)\) such that (9) evaluated for \(\hat {\beta }\) and \(\hat {\beta }_q\), is, respectively, \(\delta _{\hat {\beta }}\), and \(\delta _{\hat {\beta }_q}\):

$$\begin{array}{@{}rcl@{}} \delta_{\hat{\beta}} &=& \left| \frac{ \lambda^{(2q)}(\zeta |\hat{\beta} )} { (2q)! \ k_q^2} \right| \ , \end{array} $$
(69)
$$\begin{array}{@{}rcl@{}} \delta_{\hat{\beta}_q} &=& \left| \frac{ \lambda^{(2q)}(\zeta^{\prime}|\hat{\beta}_q )} { (2q)! \ k_q^2} \right| \ . \end{array} $$
(70)

Then there exists a δ for any quadrature order q,

$$ \delta = \max{ \bigg\{ \delta_{\hat{\beta}}, \delta_{\hat{\beta}_q} \bigg\}}\, $$
(71)

such that

$$ \left| \ell(\beta ) - \ell_{q} (\beta ) \right| < \delta \ , \ \beta \in \left\{ \hat{\beta}, \hat{\beta}_q \right\} \ . $$
(72)

The following lemma can be proven.

Lemma 2

Log-Likelihood Approximation

$$ \left| \ell(\hat{\beta}) - \ell_{q} (\hat{\beta}_{q} ) \right| < \delta \ . $$
(73)

Proof

Suppose, \(\ell _q(\hat {\beta }_q)\) is less than \(\ell (\hat {\beta }_q)\). By Eq. (72)

$$ \ell(\hat{\beta}_q) < \ell_q(\hat{\beta}_q) + \delta \ . $$
(74)

From Eq. (74) and concavity the smallest that \(\ell (\hat {\beta })\) can be is \(\ell (\hat {\beta }_q)\). Then \(\ell _q(\hat {\beta }_q) - \ell (\hat \beta ) < \delta \). Similarly, by concavity, \(\ell _q(\hat {\beta } ) + \delta \) is less than q (β q )+δ. Then by (72)\(\ell (\hat {\beta })\) is upper-bounded:

$$ \ell(\hat{\beta}) \leq \ell_q(\hat{\beta}) + \delta \leq \ell_q(\hat{\beta}_q) + \delta \ , $$
(75)

again implying \(\ell (\hat {\beta }) - \ell _q(\hat {\beta }_q) \leq \delta \).

If instead \(\ell _q(\hat {\beta }_q) > \ell (\hat {\beta }_q)\), then

$$ \ell_q(\hat{\beta}_q) < \ell(\hat{\beta}_q) + \delta \ . $$
(76)

By concavity we have,

$$ \ell_q(\hat{\beta}) \leq \ell_q(\hat{\beta}_q) < \ell(\hat{\beta}_q) + \delta \leq \ell(\hat{\beta}) + \delta \ . $$
(77)

From Eq. (72)\(| \ell _q(\hat {\beta }) - \ell (\hat {\beta }) | < \delta \) implying \(| \ell _q(\hat {\beta }_q) - \ell (\hat {\beta })| < \delta \), and the proof is complete. □

Having established the proximity between the log-likelihood at \(\hat {\beta }\) with the approximate log-likelihood at \(\hat {\beta }_{q} \), it remains to show that \(\hat {\beta }_{q} \) approximates \(\hat {\beta }\).

Lemma 3

\(\hat {\beta }_{q} \) approximates \(\hat {\beta }\)

Fix δ>0. If

$$ \left| \ell(\hat{\beta}) - \ell_{q} (\hat{\beta}_{q} ) \right| < \delta \ , $$
(78)

then there exists an 𝜖>0,

$$ \left| \hat{\beta}_q - \hat{\beta} \right| < \epsilon \ , $$
(79)

such that

$$ \left| \epsilon^2 \frac{d^2 \ell_{q} (\hat{\beta}_{q} )} { 2 \ d\beta^2} + \epsilon^3 \frac{d^3 \ell_{q} (\eta )} { 6 \ d \beta^3} \right| < 2 \delta \ . $$
(80)

Proof

Taylor expanding q about \(\hat {\beta }_{q} \) and evaluating at \(\hat {\beta }\) yields:

$$\begin{array}{@{}rcl@{}} \ell_{q} (\hat{\beta} ) &=&\ell_{q} (\hat{\beta}_{q} ) + \frac{d^2\ell_{q} (\hat{\beta}_{q} )}{2\ d\beta^2} \left( \hat{\beta} - \hat{\beta}_{q} \right)^2\\ && + \frac{d^3\ell_{q} (\eta)}{6 \ d\beta^3} \left( \hat{\beta} - \hat{\beta}_{q} \right)^3 \, \end{array} $$
(81)

with \(\eta \in \left (\hat {\beta }_{q} , \hat {\beta } \right )\). By the triangle inequality,

$$\begin{array}{@{}rcl@{}} \left| \ell_q(\hat{\beta}) - \ell_q(\hat{\beta}_q) \right| &\leq& \left| \ell_q(\hat{\beta}) - \ell(\hat{\beta}) \right| + \left| \ell(\hat{\beta}) - \ell_q(\hat{\beta}_q) \right| \ , \\ &<& \delta + \delta \ , \\ &=& 2 \delta \ . \end{array} $$
(82)

Then:

$$ \left| \epsilon^2 \frac{d^2 \ell_{q} (\hat{\beta}_{q} )} { 2 \ d\beta^2} + \epsilon^3 \frac{d^3 \ell_{q} (\eta )} { 6\ d\beta^3} \right| < 2 \delta \ , $$
(83)

with \(\epsilon = \hat {\beta } - \hat {\beta }_q\). □

Appendix F: Maximum parameter estimate error

The δ in Lemma 3 is the larger of the two quadrature errors, \(\delta _{\hat {\beta }_q}\) and \(\delta _{\hat {\beta }}\); the former computed for the known \(\hat {\beta }_q\), and the other for the unknowable \(\hat {\beta }\). If a bound is placed upon \(\lambda ^{(2q)} / k^2_q\), and the limit taken as q tends to infinity, both of these error bounds tend to zero, and hence become close. Here, to obtain an estimate of 𝜖 it is assumed that \(\delta = \delta _{\hat {\beta }_q} = \delta _{\hat {\beta }}\). With this specification, 𝜖, the parameter estimate deviation from the true MLE can be specified. From Eq. (9), set

$$ \left| \frac{\lambda^{(2q)}(\zeta | \ \hat{\beta}_{q} )}{\left(2q\right)! \ k_q^2} \right| = \delta \ . $$
(84)

Then, for η as specified after Eq. (81),

$$ \left| \epsilon^2 \frac{d^2 \ell_{q} (\hat{\beta}_{q} )} { 2 d\beta^2} + \epsilon^3 \frac{d^3 \ell_{q} (\eta )} { 6 \ d\beta^3} \right| < 2 \left| \frac{\lambda^{(2q)}(\zeta | \ \hat{\beta}_{q})}{\left(2q\right)! \ k_q^2} \right| \ . $$
(85)

With this specification,

$$\begin{array}{@{}rcl@{}} \epsilon^2 < -4 \left( \frac{d^2 \ell_{q} (\hat{\beta}_{q} )} { d\beta^2} \right)^{-1} \left| \frac{\lambda^{(2q)}(\zeta | \ \hat{\beta}_{q} )}{\left(2q\right)! \ k_q^2} \right| + O(\epsilon^3 ) \ . \\ \end{array} $$
(86)

It is useful to set 𝜖 2 equal to the bound in Eq. (86). Let

$$ \sigma_{\beta} = \left| \frac{d^2 \ell_{q} (\hat{\beta}_{q} )} { d\beta^2} \right|^{-1/2} \ , $$
(87)

and

$$\begin{array}{@{}rcl@{}} x = 2 \sigma_{\beta} \sqrt{ \frac{ \left| \lambda^{(2q)}(\zeta | \ \hat{\beta}_{q} ) \right|} { \left(2q\right)! \ k_q^2} } \ . \end{array} $$
(88)

Then,

$$\begin{array}{@{}rcl@{}} \epsilon = x + O\left(\epsilon^{3} \right) \ . \end{array} $$
(89)

Appendix G: Accuracy of observed fisher information

Lemma 4

𝜖-Equivalence of Observed Fisher Information

Let 𝜖 be as specified in Eq. ( 89 ). Introduce the unbiased estimators \(\tilde {\beta }\) , \(\tilde {\beta }_q\) , whose realizations are the estimates \(\hat {\beta }\) and \(\hat {\beta }_q\) of the parameters β and β q . Further, let the realization of the random variable X be the quadrature error for any given data set, and specify X to be independent of the true MLE estimator \(\tilde {\beta }\) . Then the variance, \(var\left \{ \tilde {\beta }_q\right \}\) satisfies:

$$ \left| var\left\{ \tilde{\beta}_q\right\} - var\left\{ \tilde{\beta} \right\} \right| \leq 4 \epsilon^2 \ . $$
(90)

Proof

The proof follows from direct calculation. Consider

$$\begin{array}{@{}rcl@{}} var\left\{ \tilde{\beta}_q \right\} &=&E\left\{ \left( \tilde{\beta}_q - \beta_q \right)^2 \right\} \ , \\ &=& E\left\{ \left[ \left( \tilde{\beta} + X \right) - \beta_q \right]^2 \right\} \ , \\ &=& var\left\{ \tilde{\beta} \right\} + \left[ \beta^2 + \beta_q^2 - 2 \beta_q \beta \right]\\ &&+ 2 \ E\left\{ X \tilde{\beta} \right\} - 2 \beta_q E\left\{ X \right\} + E\left\{ X^2 \right\} \ . \end{array} $$
(91)

For some realized quadrature error η, \(\left | \eta \right | < \epsilon \), the term,

$$\begin{array}{@{}rcl@{}} \beta^2 + \beta_q^2 - 2 \beta_q \beta &=&\beta^2 + \left( \beta + \eta \right)^2 - 2 \left( \beta + \eta \right) \beta \ , \end{array} $$
(92)
$$\begin{array}{@{}rcl@{}} &=& \eta^2 \ . \end{array} $$
(93)

Then

$$\begin{array}{@{}rcl@{}} \beta^2 + \beta_q^2 - 2 \beta_q \beta \leq \epsilon^2 \ . \end{array} $$
(94)

Similarly the contribution to Eq. (91) from the terms involving X can be bounded:

$$\begin{array}{@{}rcl@{}} E\left\{ X^2 \right\} + 2E\left\{X\right\} \left( \beta - \beta_q \right) &=&E\left\{ X^2 \right\} - 2 \eta E\left\{X\right\} , \\ &\leq& E\left\{ X^2 \right\} + 2 \left| \eta E\left\{X\right\} \right| , \\ &\leq& \epsilon^2 + 2 \epsilon^2 \ . \end{array} $$
(95)

Equations (91), (94), and (95) imply Eq. (90). □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lepage, K.Q., MacDonald, C.J. Fast maximum likelihood estimation using continuous-time neural point process models. J Comput Neurosci 38, 499–519 (2015). https://doi.org/10.1007/s10827-015-0551-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-015-0551-y

Keywords

Navigation