Background

The first known avidin was isolated from the chicken (Gallus gallus) egg white in 1941 [1] as a minor protein component showing extremely high avidity to biotin (Kd ≈ 10−15 M) and is a text-book example of tight protein–ligand interaction [1, 2]. This combined with the avidin’s compact tetrameric structure with four biotin-binding sites in each functional protein, and the existing methods to biotinylate a vast variety of biomolecules, has made avidin an important biotechnological tool in protein purification, detection, and assay technologies, but also in diagnostics and pharmaceuticals [3, 4].

The first bacterial avidin, streptavidin, was isolated from antibiotic-secreting Streptomyces avidinii bacteria in 1964 [5]. Since then, several new avidins have been experimentally verified from both eukaryotic and prokaryotic species. Ten avidin family members were identified in the chicken genome between the 1980s and the early 2000s [6, 7], and they were showed to resemble avidin structurally and functionally when expressed as recombinant proteins [8, 9]. Further eukaryotic avidins have been found in other avian species, reptiles, amphibians, sea urchin, fish, lancelet and fungi [10,11,12]. Several putative novel bacterial avidin genes have been detected from bacteria in a wide variety of environmental niches including symbiotic, marine, and pathogenic species. However, none of these bacterial avidins except streptavidin and closely related streptavidin v1 and v2 from Streptomyces venezuelae [13] have been confirmed to be expressed in nature. Avidins are made of beta barrels and their oligomeric state vary from loose dimeric assembly to very stable tetramer.

Avidin has been suggested to have antibiotic qualities, as it renders biotin vitamin unavailable. In oviparous animals, avidins are theorized to protect the eggs from microbes [14]. Evidence that chicken oviductal tissue produces avidin in response to bacterial, viral, and environmental stress supports this hypothesis [14,15,16,17]. A recent study revealed that avidin is expressed in avian primary gut epithelial cells along proinflammatory cytokines as acute phase proteins [18]. In line with these findings, two avidin genes, Bjavd 1 and 2 [Enrichment analysis

The following bacterial genomes, representing different sub-branches of the phylogenetic cladogram trees, were chosen to be assessed in enrichment analysis: Bradyrhizobium diazoefficiens (BA000040, GenBank), Ralstonia eutropha (CP000090–93), Rhizobium etli (CP001074–77), Methylobacterium extorquens (CP001298–1300), Catenulispora acidiphila (CP001700), M. mediterranea (CP002583), Ralstonia pickettii (CP00667–69), Legionella pneumophila (CR628336–38), and Xanthomonas fuscans (FO681494–97) [68]. The genomic features from these organisms and their assemblies were pooled together, and avidin (putative or verified) gene’s vicinity was defined as 500 bp upstream and downstream from the gene’s termini. Gene Ontology (GO-terms) were searched for each feature. If the feature was not annotated to any GO-term, the annotations for PFAM, IPR, or TGRFAM terms were mapped to corresponding GO-terms. Fischer’s exact test was performed to evaluate, if features annotated to a certain GO-term clustered significantly more often with avidin gene than expected by random distribution. Biopython was used for the processing and analysing the data.

Visualization

The 3D structures obtained from Protein Data Bank were visualized using VMD 1.9.3.

Homology modelling

The homology model of Oleiagrimonas soli protease-avidin fusion protein was generated with Modeller 9.25 [74]. Swine pepsin (PDB ID: 4PEP; [75]) was used as a template for the protease domain, and streptavidin (PDB ID: 3RY2; [76]) for the avidin domain.

Pairwise similarity and identity

Pairwise sequence identity and pairwise sequence similarity were calculated using MatGAT 2.0 program (Matrix Global Alignment Tool) [77].

Signal peptide prediction

The presence of signal peptide was predicted using SignalP 5.0 [78].

Sequence logos

The sequence logos shown in Fig. 3f were built using ggseqlogo package in R [79]. The logos were manually curated to show only residues with occurrence above 20%.