Log in

easyMF: A Web Platform for Matrix Factorization-Based Gene Discovery from Large-scale Transcriptome Data

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

With the development of high-throughput experimental technologies, large-scale RNA sequencing (RNA-Seq) data have been and continue to be produced, but have led to challenges in extracting relevant biological knowledge hidden in the produced high-dimensional gene expression matrices. Here, we develop easyMF (https://github.com/cma2015/easyMF), a web platform that can facilitate functional gene discovery from large-scale transcriptome data using matrix factorization (MF) algorithms. Compared with existing MF-based software packages, easyMF exhibits several promising features, such as greater functionality, flexibility and ease of use. The easyMF platform is equipped using the Big-Data-supported Galaxy system with user-friendly graphic user interfaces, allowing users with little programming experience to streamline transcriptome analysis from raw reads to gene expression, carry out multiple-scenario MF analysis, and perform multiple-way MF-based gene discovery. easyMF is also powered with the advanced packing technology to enhance ease of use under different operating systems and computational environments. We illustrated the application of easyMF for seed gene discovery from temporal, spatial, and integrated RNA-Seq datasets of maize (Zea mays L.), resulting in the identification of 3,167 seed stage-specific, 1,849 seed compartment-specific, and 774 seed-specific genes, respectively. The present results also indicated that easyMF can prioritize seed-related genes with superior prediction performance over the state-of-art network-based gene prioritization system MaizeNet. As a modular, containerized and open-source platform, easyMF can be further customized to satisfy users’ specific demands of functional gene discovery and deployed as a web service for broad applications.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. One Thousand Plant Transcriptomes Initiative (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574:679–685. https://doi.org/10.1038/s41586-019-1693-2

    Article  CAS  Google Scholar 

  2. Nelms B, Walbot V (2019) Defining the developmental program leading to meiosis in maize. Science 364:52–56. https://doi.org/10.1126/science.aav6428

    Article  CAS  PubMed  Google Scholar 

  3. Cardoso-Moreira M, Halbert J, Valloton D, Velten B, Chen C, Shao Y, Liechti A, Ascenção K, Rummel C, Ovchinnikova S, Mazin PV, Xenarios I, Harshman K, Mort M, Cooper DN, Sandi C, Soares MJ, Ferreira PG, Afonso S, Carneiro M, Turner JMA, VandeBerg JL, Fallahshahroudi A, Jensen P, Behr R, Lisgo S, Lindsay S, Khaitovich P, Huber W, Baker J, Anders S, Zhang YE, Kaessmann H (2019) Gene expression across mammalian organ development. Nature 571:505–509. https://doi.org/10.1038/s41586-019-1338-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sarropoulos I, Marin R, Cardoso-Moreira M, Kaessmann H (2019) Developmental dynamics of lncRNAs across mammalian organs and species. Nature 571:510–514. https://doi.org/10.1038/s41586-019-1341-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Qiu Z, Chen S, Qi Y, Liu C, Zhai J, **e S, Ma C (2021) Exploring transcriptional switches from pairwise, temporal and population RNA-Seq data using deepTS. Brief Bioinform. 22:bbaa137. https://doi.org/10.1093/bib/bbaa137

    Article  CAS  PubMed  Google Scholar 

  6. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42:30–37. https://doi.org/10.1109/mc.2009.263

    Article  Google Scholar 

  7. Abdi H, Williams LJ (2010) Principal component analysis, Wiley Interdiscip. Rev. Comput Stat 2:433–459. https://doi.org/10.1002/wics.101

    Article  Google Scholar 

  8. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430. https://doi.org/10.1016/s0893-6080(00)00026-5

    Article  PubMed  Google Scholar 

  9. Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization. Adv Neural Inform Process Syst 13:556–562

    Google Scholar 

  10. Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ (2018) Enter the matrix: factorization uncovers knowledge from omics. Trends Genet 34:790–805. https://doi.org/10.1016/j.tig.2018.07.003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. https://doi.org/10.1038/44565

    Article  CAS  PubMed  Google Scholar 

  12. Sompairac N, Nazarov PV, Czerwinska U, Cantini L, Biton A, Molkenov A, Zhumadilov Z, Barillot E, Radvanyi F, Gorban A, Kairov U, Zinovyev A (2019) Independent component analysis for unraveling the complexity of cancer omics datasets. Int J Mol Sci. https://doi.org/10.3390/ijms20184414

    Article  PubMed  PubMed Central  Google Scholar 

  13. Noor E, Cherkaoui S, Sauer U (2019) Biological insights through omics data integration. Curr Opin Syst Biol 15:39–47. https://doi.org/10.1016/j.coisb.2019.03.007

    Article  Google Scholar 

  14. Nguyen ND, Wang D (2020) Multiview learning for understanding functional multiomics. PLoS Comput Biol 16:e1007677. https://doi.org/10.1371/journal.pcbi.1007677

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Liao R, Zhang Y, Guan J, Zhou S (2014) CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinformatics 12:48–51. https://doi.org/10.1016/j.gpb.2013.06.001

    Article  PubMed  Google Scholar 

  16. Marini F, Binder H (2019) pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components. BMC Bioinformatics 20:331. https://doi.org/10.1186/s12859-019-2879-1

    Article  PubMed  PubMed Central  Google Scholar 

  17. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883. https://doi.org/10.1093/bioinformatics/bts034

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fehrmann RSN, Karjalainen JM, Krajewska M, Westra H-J, Maloney D, Simeonov A, Pers TH, Hirschhorn JN, Jansen RC, Schultes EA, van Haagen HHHBM, de Vries EGE, te Meerman GJ, Wijmenga C, van Vugt MATM, Franke L (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet 47:115–125. https://doi.org/10.1038/ng.3173

    Article  CAS  PubMed  Google Scholar 

  19. Gaujoux R, Seoighe C (2010) A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11:367. https://doi.org/10.1186/1471-2105-11-367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Stein-O’Brien GL, Carey JL, Lee WS, Considine M, Favorov AV, Flam E, Guo T, Li S, Marchionni L, Sherman T, Sivy S, Gaykalova DA, McKay RD, Ochs MF, Colantuoni C, Fertig EJ (2017) PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF. Bioinformatics 33:1892–1894. https://doi.org/10.1093/bioinformatics/btx058

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Yi F, Gu W, Chen J, Song N, Gao X, Zhang X, Zhou Y, Ma X, Song W, Zhao H, Esteban E, Pasha A, Provart NJ, Lai J (2019) High temporal-resolution transcriptome landscape of early maize seed development. Plant Cell 31:974–992. https://doi.org/10.1105/tpc.18.00961

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Pagnussat GC, Alandete-Saez M, Bowman JL, Sundaresan V (2009) Auxin-dependent patterning and gamete specification in the Arabidopsis female gametophyte. Science 324:1684–1689. https://doi.org/10.1126/science.1167324

    Article  CAS  PubMed  Google Scholar 

  23. Jung HG, Casler MD (2006) Maize stem tissues: impact of development on cell wall degradability. Crop Sci 46:1801–1809. https://doi.org/10.2135/cropsci2006.02-0086

    Article  CAS  Google Scholar 

  24. Zhan J, Thakare D, Ma C, Lloyd A, Nixon NM, Arakaki AM, Burnett WJ, Logan KO, Wang D, Wang X, Drews GN, Yadegari R (2015) RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell 27:513–531. https://doi.org/10.1105/tpc.114.135657

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Doll NM, Just J, Brunaud V, Caïus J, Grimault A, Depège-Fargeix N, Esteban E, Pasha A, Provart NJ, Ingram GC, Rogowsky PM, Widiez T (2020) Transcriptomics at maize embryo/endosperm interfaces identifies a transcriptionally distinct endosperm subdomain adjacent to the embryo scutellum. Plant Cell 32:833–852. https://doi.org/10.1105/tpc.19.00756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Wu X, Chory J, Weigel D (2007) Combinations of WOX activities regulate tissue proliferation during Arabidopsis embryonic development. Dev Biol 309:306–316. https://doi.org/10.1016/j.ydbio.2007.07.019

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Miquel M, Trigui G, d’Andréa S, Kelemen Z, Baud S, Berger A, Deruyffelaere C, Trubuil A, Lepiniec L, Dubreucq B (2014) Specialization of oleosins in oil body dynamics during seed development in Arabidopsis seeds. Plant Physiol 164:1866–1878. https://doi.org/10.1104/pp.113.233262

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kryuchkova-Mostacci N, Robinson-Rechavi M (2017) A benchmark of gene expression tissue-specificity metrics. Brief Bioinform 18:205–214. https://doi.org/10.1093/bib/bbw008

    Article  CAS  PubMed  Google Scholar 

  29. Ma C, Li B, Wang L, Xu M-L, Lizhu E, ** H, Wang Z, Ye J-R (2019) Characterization of phytohormone and transcriptome reprogramming profiles during maize early kernel development. BMC Plant Biol 19:197. https://doi.org/10.1186/s12870-019-1808-9

    Article  PubMed  PubMed Central  Google Scholar 

  30. Bernardi J, Lanubile A, Li Q-B, Kumar D, Kladnik A, Cook SD, Ross JJ, Marocco A, Chourey PS (2012) Impaired auxin biosynthesis in the defective endosperm18 mutant is due to mutational loss of expression in the ZmYuc1 gene encoding endosperm-specific YUCCA1 protein in maize. Plant Physiol 160:1318–1328. https://doi.org/10.1104/pp.112.204743

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhang Z, Dong J, Ji C, Wu Y, Messing J (2019) NAC-type transcription factors regulate accumulation of starch and protein in maize seeds. Proc Natl Acad Sci U S A 116:11223–11228. https://doi.org/10.1073/pnas.1904995116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Schmidt RJ, Veit B, Mandel MA, Mena M, Hake S, Yanofsky MF (1993) Identification and molecular characterization of ZAG1, the maize homolog of the Arabidopsis floral homeotic gene AGAMOUS. Plant Cell 5:729–737. https://doi.org/10.1105/tpc.5.7.729

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. López M, Gómez E, Faye C, Gerentes D, Paul W, Royo J, Hueros G, Muñiz LM (2017) zmsbt1 and zmsbt2, two new subtilisin-like serine proteases genes expressed in early maize kernel development. Planta 245:409–424. https://doi.org/10.1007/s00425-016-2615-2

    Article  CAS  PubMed  Google Scholar 

  34. Schmidt RJ, Burr FA, Aukerman MJ, Burr B (1990) Maize regulatory gene opaque-2 encodes a protein with a “leucine-zipper” motif that binds to zein DNA. Proc Natl Acad Sci U S A 87:46–50. https://doi.org/10.1073/pnas.87.1.46

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Feng F, Qi W, Lv Y, Yan S, Xu L, Yang W, Yuan Y, Chen Y, Zhao H, Song R (2018) OPAQUE11 is a central hub of the regulatory network for maize endosperm development and nutrient metabolism. Plant Cell 30:375–396. https://doi.org/10.1105/tpc.17.00616

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Flint-Garcia SA, Bodnar AL, Scott MP (2009) Wide variability in kernel composition, seed characteristics, and zein profiles among diverse maize inbreds, landraces, and teosinte. Theor Appl Genet 119:1129–1142. https://doi.org/10.1007/s00122-009-1115-1

    Article  PubMed  Google Scholar 

  37. Shannon JC, Pien FM, Cao H, Liu KC (1998) Brittle-1, an adenylate translocator, facilitates transfer of extraplastidial synthesized ADP-glucose into amyloplasts of maize endosperms. Plant Physiol 117:1235–1252. https://doi.org/10.1104/pp.117.4.1235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Tsai CY (1979) Tissue-specific zein synthesis in maize kernel. Biochem Genet 17:1109–1119. https://doi.org/10.1007/bf00504348

    Article  CAS  PubMed  Google Scholar 

  39. Li C, Qiao Z, Qi W, Wang Q, Yuan Y, Yang X, Tang Y, Mei B, Lv Y, Zhao H, **ao H, Song R (2015) Genome-wide characterization of cis-acting DNA targets reveals the transcriptional regulatory framework of opaque2 in maize. Plant Cell 27:532–545. https://doi.org/10.1105/tpc.114.134858

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lee T, Lee S, Yang S, Lee I (2019) MaizeNet: a co-functional network for network-assisted systems genetics in Zea mays. Plant J 99:571–582. https://doi.org/10.1111/tpj.14341

    Article  CAS  PubMed  Google Scholar 

  41. Preiss J, Danner S, Summers PS, Morell M, Barton CR, Yang L, Nieder M (1990) Molecular characterization of the Brittle-2 gene effect on maize endosperm ADPglucose pyrophosphorylase subunits. Plant Physiol 92:881–885. https://doi.org/10.1104/pp.92.4.881

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Guo M, Rupe MA, Danilevskaya ON, Yang X, Hu Z (2003) Genome-wide mRNA profiling reveals heterochronic allelic variation and a new imprinted gene in hybrid maize endosperm. Plant J 36:30–44. https://doi.org/10.1046/j.1365-313x.2003.01852.x

    Article  CAS  PubMed  Google Scholar 

  43. Carlson SJ, Chourey PS (1996) Evidence for plasma membrane-associated forms of sucrose synthase in maize. Mol Gen Genet 252:303–312. https://doi.org/10.1007/bf02173776

    Article  CAS  PubMed  Google Scholar 

  44. Chen J, Zeng B, Zhang M, **e S, Wang G, Hauck A, Lai J (2014) Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol 166:252–264. https://doi.org/10.1104/pp.114.240689

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Haun WJ, Springer NM (2008) Maternal and paternal alleles exhibit differential histone methylation and acetylation at maize imprinted genes. Plant J 56:903–912. https://doi.org/10.1111/j.1365-313x.2008.03649.x

    Article  CAS  PubMed  Google Scholar 

  46. Bernardi J, Battaglia R, Bagnaresi P, Lucini L, Marocco A (2019) Transcriptomic and metabolomic analysis of ZmYUC1 mutant reveals the role of auxin during early endosperm formation in maize. Plant Sci 281:133–145. https://doi.org/10.1016/j.plantsci.2019.01.027

    Article  CAS  PubMed  Google Scholar 

  47. Zhan J, Li G, Ryu C-H, Ma C, Zhang S, Lloyd A, Hunter BG, Larkins BA, Drews GN, Wang X, Yadegari R (2018) Opaque-2 regulates a complex gene network associated with cell differentiation and storage functions of maize endosperm. Plant Cell 30:2425–2446. https://doi.org/10.1105/tpc.18.00392

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Bolser DM, Staines DM, Perry E, Kersey PJ (2017) Ensembl Plants: integrating tools for visualizing, mining, and analyzing plant genomic data. Methods Mol Biol 1533:1–31. https://doi.org/10.1007/978-1-4939-3167-5_6

    Article  CAS  PubMed  Google Scholar 

  49. Wimalanathan K, Friedberg I, Andorf CM, Lawrence-Dill CJ (2018) Maize GO annotation-methods, evaluation, and review (maize-GAMER). Plant Direct 2:e00052. https://doi.org/10.1002/pld3.52

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Lee T, Yang S, Kim E, Ko Y, Hwang S, Shin J, Shim JE, Shim H, Kim H, Kim C, Lee I (2015) AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species. Nucleic Acids Res 43:D996-1002. https://doi.org/10.1093/nar/gku1053

    Article  CAS  PubMed  Google Scholar 

  51. Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I (2017) WheatNet: a genome-scale functional network for hexaploid bread wheat, Triticum aestivum. Mol Plant 10:1133–1136. https://doi.org/10.1016/j.molp.2017.04.006

    Article  CAS  PubMed  Google Scholar 

  52. Kim H, Kim BS, Shim JE, Hwang S, Yang S, Kim E, Iyer-Pascuzzi AS, Lee I (2017) TomatoNet: a genome-wide co-functional network for unveiling complex traits of tomato, a model crop for fleshy fruits. Mol Plant 10:652–655. https://doi.org/10.1016/j.molp.2016.11.010

    Article  CAS  PubMed  Google Scholar 

  53. Liu H, Zhou Y, Qiu H, Zhuang R, Han Y, Liu X, Qiu X, Wang Z, Xu L, Tan R, Hong Q, Wang T, Liu H (2021) Rab26 suppresses migration and invasion of breast cancer cells through mediating autophagic degradation of phosphorylated Src. Cell Death Dis 12:284. https://doi.org/10.1038/s41419-021-03561-7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Liu T, Fang Y (2021) Research for expression and prognostic value of GABRD in colon cancer and coexpressed gene network construction based on data mining. Comput Math Methods Med. https://doi.org/10.1155/2021/5544182

    Article  PubMed  PubMed Central  Google Scholar 

  55. Ramos-Rodriguez R-R, Cuevas-Diaz-Duran R, Falciani F, Tamez-Peña J-G, Trevino V (2012) COMPADRE: an R and web resource for pathway activity analysis by component decompositions. Bioinformatics 28:2701–2702. https://doi.org/10.1093/bioinformatics/bts513

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank High-Performance Computing (HPC) of Northwest Agriculture and Forestry University for providing computing resources.

Funding

This work was supported by the National Natural Science Foundation of China (31570371), the Youth 1000-Talent Program of China, the Hundred Talents Program of Shaanxi Province of China, Projects of Youth Technology New Star of Shaanxi Province (2017KJXX-67), and the Fundamental Research Funds for the Central Universities (2452020041).

Author information

Authors and Affiliations

Authors

Contributions

C.M. conceived the project; W.M. and S.C. developed the software; M.S., J.Z., T.Z. and S.X. tested the software. W.M. and S.C. performed the bioinformatics analysis; S.C., C.M., Y.Q. and G.W. interpreted the bioinformatics analysis results; W.M., S.C. and C.M. wrote the article.

Corresponding author

Correspondence to Chuang Ma.

Ethics declarations

Conflict of interest

The authors have declared no competing interests.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 4367 KB)

Supplementary file1 (XLSX 8642 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, W., Chen, S., Qi, Y. et al. easyMF: A Web Platform for Matrix Factorization-Based Gene Discovery from Large-scale Transcriptome Data. Interdiscip Sci Comput Life Sci 14, 746–758 (2022). https://doi.org/10.1007/s12539-022-00522-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-022-00522-2

Keywords

Navigation