Abstract
Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data is difficult; therefore, transfer learning can improve the robustness of learned networks by leveraging data from related tasks. Existing transfer learning algorithms for Bayesian network structure learning give a single maximum a posteriori estimate of network models. Yet, many other models may be equally likely, and so a more informative result is provided by Bayesian structure discovery. Bayesian structure discovery algorithms estimate posterior probabilities of structural features, such as edges. We present transfer learning for Bayesian structure discovery which allows us to explore the shared and unique structural features among related tasks. Efficient computation requires that our transfer learning objective factors into local calculations, which we prove is given by a broad class of transfer biases. Theoretically, we show the efficiency of our approach. Empirically, we show that compared to single-task learning, transfer learning is better able to positively identify true edges. We apply the method to whole-brain neuroimaging data.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-014-0775-6/MediaObjects/10115_2014_775_Fig11_HTML.gif)
Similar content being viewed by others
References
Bailey WN (1935) Generalised hypergeometric series. University Press, Cambridge
Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Second European conference on artificial intelligence in medicine, vol 38, pp 247–256
Björklund A, Husfeldt T, Kaski P, Koivisto M (2007) Fourier meets Möbius: fast subset convolution. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing, ACM, pp 67–74
Buntine W (1991) Theory refinement on Bayesian networks. In: Seventh conference on uncertainty in artificial intelligence, pp 52–60
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9:309–347
Cooper G, Yoo C (1999) Causal discovery from a mixture of experimental and observational data. In: Conference on uncertainty in artificial intelligence, pp 116–125
Friedman N, Koller D (2003) Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50(1):95–125
Grzegorczyk M, Husmeier D (2008) Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Mach Learn 71(2–3):265–305
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
Koivisto M (2006) Advances in exact Bayesian structure discovery in Bayesian networks. In: Twenty-second conference annual conference on uncertainty in artificial intelligence, pp 241–248
Koivisto M, Sood K (2004) Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5:549–573
Lauritzen S, Spiegelhalter D (1988) Local computations with probabilities on graphical structures and their application to expert systems. J R Stat Soc B (Methodological) 50:157–224
Luis R, Sucar LE, Morales EF (2010) Inductive transfer for learning Bayesian networks. Mach Learn 79(1–2):227–255
Madigan D, York J, Allard D (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Niculescu-Mizil A, Caruana R (2007) Inductive transfer for Bayesian network structure learning. In: Eleventh international conference on artificial intelligence and statistics
Niinimaki T, Parviainen P, Koivisto M (2011) Partial order MCMC for structure discovery in Bayesian networks. In: Twenty-seventh annual conference on uncertainty in artificial intelligence, pp 557–564
Oyen D, Lane T (2012) Leveraging domain knowledge in multitask Bayesian network structure learning. In: Twenty-Sixth AAAI conference on artificial intelligence
Parviainen P, Koivisto M (2009) Exact structure discovery in Bayesian networks with less space. In: Twenty-fifth conference on uncertainty in artificial intelligence, pp 436–443
Pearl J, Bareinboim E (2011) Transportability of causal and statistical relations: a formal approach. In: Twenty-fifth national conference on artificial intelligence, pp 247–254
Richardson M, Domingos P (2003) Learning with knowledge from multiple experts. In: ICML, vol 20, pp 624–631
Thrun S (1996) Is learning the n-th thing any easier than learning the first? Adv Neural Inf Process Syst, 8:640–646
Tong S, Koller D (2001) Active learning for structure in Bayesian networks. In: International joint conference on artificial intelligence, vol 17, pp 863–869
Werhli A, Husmeier D (2007) Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 6(1)
Zhang J, Zhang C (2010) Multitask Bregman clustering. In: Twenty-fourth national conference on artificial intelligence
Acknowledgments
Thanks to Vincent P. Clark and the Mind Research Network for neuroimaging data. Also, thanks to Eric Eaton and Paul Ruvolo for useful discussions and feedback. Worked funded by the Office of Naval Research grant N00014-000-0000.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Extended proof of Theorem
We derive the equality in Theorem starting from the joint probability of data and the feature conditioned on an order given in Eq. 9,
Modularity properties are used to substitute \(P(D|G) = \prod _{i=1}^n P(x_i | \pi _i)\), and, \(f(G) = \prod _{i=1}^n f_i(\pi _i)\), and, \(P(G^{(t)}, G | \! \prec ) = \prod _{i=1}^n P(\pi _i^{(t)}, \pi _i | U_i)\), as follows,
The goal is to factor the terms so that local calculations at each node, \(i\), can be calculated independently. First, we factor terms in the structure bias (the part in round parentheses above) as follows:
Plugging the factored structure prior back in to the full equation, we now have:
If we perform a similar factoring to that done above on the structure prior, we get achieve the desired result:
1.2 Normalization constant for structure bias
Calculation of the normalization constant requires summing over all possible combinations of parent sets \((\pi _i^{(t)}, \pi _i^{(k)})\), as follows:
We can simplify this by first fixing parent set \(\pi _i^{(t)}\) while counting how many parent sets \(\pi _i^{(k)}\) will give \(\Delta _{itk} = 0\) (\(\pi _i^{(k)}\) can contain any parents from the set \(\{U_i {\setminus } \pi _i^{(t)}\}\) but none from \(\pi _i^{(t)}\)); then how many \(\pi _i^{(k)}\) will give \(\Delta _{itk} = 1\) (\(\pi _i^{(k)}\) can contain any parents from the set \(\{U_i {\setminus } \pi _i^{(t)}\}\) and exactly one from \(\pi _i^{(t)}\)); etc, up to the maximum of \(\Delta _{itk} = |\pi _i^{(t)}|\). This sum turns out to be a binomial expansion, and so we can write it in closed form. The expansion is demonstrated here:
Next, we perform a similar expansion of the sum over parent sets \(\pi _i^{(t)}\) that have size \(|\pi _i^{(t)}| = 0\), and \(|\pi _i^{(t)}| = 1\), etc up to the maximum \(|\pi _i^{(t)}| = |U_i|\). This sum also turns out to be a binomial expansion and therefore can be simplified into a closed form, as demonstrated:
1.3 Integration of Bayesian model average
We integrate over all values of the parameter \(\lambda \) to obtain the Bayesian model average over all possible structure priors, thus eliminating the need to select a point-estimate for \(\lambda \). We use an uninformative, uniform prior for \(\lambda \), i.e., \(p(\lambda |U_i) = 1\) for \(0 \le \lambda \le 1\).
Next, we use an identity given by Euler in 1748 [1]. If \(\beta \) is the beta function and \(_2\text {F}_{\!1}\) is the ordinary hypergeometric function, then
Let \(x = \lambda ,\,a = |U_i|,\,b = 1,\,c = \Delta _{itk} + 2\), and \(z = 1/4\). Then the condition, \(\Delta _{itk} + 2 > 1 > 0\), holds for any \(\Delta _{itk} \ge 0\) which is the valid range for \(\Delta _{itk}\) and:
giving the result:
We only need the solution to this formula for a limited number of combinations of integer values of \(\Delta _{itk}\) and \(|U_i|,\,0 \le \Delta _{itk} \le |U_i| < n\), where \(n\) is the number of variables in the network. Therefore, we pre-compute a lookup table of the necessary values using the GNU Scientific LibraryFootnote 2 hypergeometric function solver.
Rights and permissions
About this article
Cite this article
Oyen, D., Lane, T. Transfer learning for Bayesian discovery of multiple Bayesian networks. Knowl Inf Syst 43, 1–28 (2015). https://doi.org/10.1007/s10115-014-0775-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0775-6