Introduction: principles of signaling

The harmonious development of living organisms requires the interplay of complex sets of regulatory systems ensuring both evolution and adaptation to the surrounding environments that are submitted to constantly changing conditions.

At the cellular level, multi-molecular systems are responsible for communication with the immediate environment (microenvironment) or remote environments via signaling molecules. Autocrine signaling solicits regulatory agents acting on receptors localized in or on the same cells as those from which they originate. In paracrine situations the signaling factors produced by one cell type act on another category of surrounding or remote cells. Each step of the various developmental pathways leading from a zygote to an organized assembly of cells tissues and organs rely on highly complex intercommunication systems involving both types of signaling systems.

For several decades, prior to the advent of sophisticated technologies addressing the biological behavior of single cells, studies aiming at deciphering molecular events underlying these communication networks have been limited to binary type of approaches in which a ligand and target were often considered as a good couple system to work with, as long as they appeared to elicit specific reproducible activities.

With the development of spatial technical platforms allowing to examine simultaneously at the most precise level, how cells respond to their signaling cross-talk with other surrounding cells in a normal or pathological context, the molecular biology of gene expression has become precise, holistic and more powerful. Establishing the spatiotemporal expression pattern of key signaling molecules is now made possible in tissues that have conserved their structural integrity. Combining the information drawn through spatial transcriptomics with corresponding genomics, metabolome, and proteome signatures offers a new realm of opportunities to address in a comprehensive way questions regarding the interconnections of key signaling pathways. These advances open the way to a wide 3D identification of targets ensuring a balanced cellular response to several regulatory molecules acting as biological modulators in normal and pathological conditions.

After briefly reviewing physical and biochemical basic principles of signaling as a means to information transport from one point to another, we will attempt to apply simple paradigms to the biological realm of CCN proteins and show how they may account for the timely and spatial control of the set of inter communicating biological pathways that permits the arise of complex creatures able to grow, multiply, and adapt to their environment.

Basics of general signaling

Signaling in all its forms relies on the interactive communication of at least two elements designated transmitter and receptor (Fig. 1).

Fig. 1
figure 1

Schematic representation of signaling. The information provided to the transmitter may be properly integrated (1), partially altered (2), or not sent (3). The transmission of the proper message may be disturbed and result in the addressing of a wrong message (4). Upon reception the original message can be received and transmitted as originally sent (1), slightly modified (2), altered (4). At that stage, the message can be misunderstood and sent with a wrong meaning (5). In the worst case, the message, which was not emitted, is lacking (6)

Upon receiving an information meant to be spread, the transmitter can either (i) properly integrate the message to emit [Fig. 1(1)], (ii) misunderstand or slightly alter the message [Fig. 1(2)] and (iii) ignore to transmit the message [Fig. 1(3)]. Once the information is being sent [Fig. 1(2) or not (X)] it travels between the transmitter and the receptor. The means used to make the signals travel from one point to the other may be aerial, terrestrial, or underground. In any case the receptor receives what has traveled from the transmitter but the message can be blurred during its transport [Fig. 1(4)].

Once it has reached the receptor, the signal is processed through the receptor machinery and further decrypted by its targets. During the processing, the message can be damaged and get another meaning. Each modification of the message content during these stepwise events can translate into abnormal or misleading information. This schematic representation allows the identification of critical steps during the transmission and reception of the message.

Mechanistic considerations

We believe that the model described in Fig. 1 constitutes an excellent metaphor of the signaling machinery functioning in the biological realm both under normal and pathological conditions. Indeed, it has been established that biological signaling is also based on interdependent pathways.

In normal conditions, the setting up of functional developmental pathways obeys a stepwise organization, with each step calling for specific regulatory factors that “sense" both the modifications of the cellular microenvironment and the biological status of differentiating cells and tissues. It is also established that in multifactorial diseases such as cancers, alterations of signaling processing steps described herein are associated with tumor development.

From a mechanistic point of view, biological signaling uses a very comparable sequence of events to those required for the transmission of vocal, written, or digitalized information.

The biochemical signaling molecules produced by organs, tissues and cells (SE), are acting either in the direct physical environment of the emitters, or travel to tissues before getting in contact with their targets (Fig. 2) that, in turn, will be elicited to generate a biological response depending upon the nature and localization of the receptor system. The signal emitted by the source can be a simple or a complex molecule. Components generated by the anabolic reactions of ingested nutriments and by consumption of organic and inorganic compounds by the host body may also act as signaling factors.

Fig. 2
figure 2

Different types of biological signaling. The signals produced by a signal emitter (SE) can be targeted to itself in an autocrine way (A), to the extracellular matrix, composed of non cellular portion of loose, dense medium, and specialized connective tissues (B), to distant target cells in epithelial, muscle, and nervous tissues (C), or to Blood cells and lymph connective tissues (D)

Stepwise biological pathways can be thought of working as schematized in Fig. 1. The information transmitted in the form of a polypeptide regulator, or molecular ligandFootnote 1 needs to reach an appropriate target (T) that is present on a downstream member of the signaling chain.

Figure 3 illustrates different basic levels of regulation encountered in enzymology. In the simplest situation, a stable or labile liaison occurs between a substrateFootnote 2 (or factor) and a target [Fig. 3(1)]. It may involve the binding of a factor showing a high (H) or low (L) affinity toward the target (T) [Fig. 3(2)]. To be efficient, this binding may require the assistance of cofactor (Co) permitting the productive interaction of the regulatory factor to the target [Fig. 3(3)]. In the fourth example, a regulatory molecule or factor, called effector (E) connects to the target in order to make it responsive to the binding of the substrate (S) [Fig. 3(4)]. In some cases, the binding of an activator effector (AF) is necessary for the reaction between S and its target T to proceed [Fig. 3(5,6)]. The activator may bind to a target part which is distinct from the binding site. Its liaison affects either the affinity of the target or its spatial conformation. This distant induction of spatial modification required for the reaction to proceed, is encountered in the allosteric enzymatic processes (Monod et al. 1963). It may involve complex ligands and is not restricted to enzymes.Footnote 3

Fig. 3
figure 3

Different levels of signaling regulation in the enzymology context. See text for details; S: substrate, T: target, Co: Co-activator, E: Effector, AF: Activator effector PANEL 3–1(1–4). Considering that the number of possible ligands (substrates) is n, the number of possible associations in the case 1 is [n+1] to take into account the case (+1) where the ligands cannot bind for any kind of external reason. For the second case, we consider two types of binding (weak or strong), hence the number of possibilities are twice as many as in case 1 : [2x(n+1)] associations. For the second case, we consider two types of binding (weak or strong), hence the number of possibilities are twice as many as in case 1 : [2x(n+1)] associations. Considering that n ligands may need a co-factor (Co) among the n-1 ligands available (case3), the number of possible associations is [nx(n-1)/2] If a coeffector (E) is requested to activate the target, then the number of possible associations is also [nx(n-1)/2] for case 4. PANEL 3–2(5–6) Similar predictions [nx(n-1)/2] can be calculated for cases 5 and 6. PANEL 3–3(7-8) If m ligands can synergize the number of possible associations is [mx(m-1)/2]and in the case of competitive action between l ligands, the number of associations becomes [l×(l–1)/2

When two different ligands can bind the same site (Fig. 3-3(7, 8)) their liaison will depend upon their relative affinity for the target and their action can be synergistic, exclusive, or antagonistic. In many cases, the issue of the competition between the two substrates (S1 and S2) for binding the target may trigger the displacement of a previously bound substrate (S1) by a second one (S2) showing a higher affinity for the target site. In other cases, not depicted here, an allosteric effect resulting from the liaison of S2 to another distant site may induce a conformational modification of the target which will translate into displacement of the previously bound S1.

Can these basic principles apply to complex biological signaling?

Receptors on cell membrane

Based on these mechanistic considerations, it is possible to recognize at the cell membrane, various specialized receptors systems being at the origin of different biological signaling paths. In the simplest models, one membrane receptor system interacts with specific ligands (from ions to complex polypeptides) showing strong or weak affinities to their cognate binding sites. Signaling cascades will result from the labile or stable liaisons of the ligands to the receptor systems. Processes modulated by the involvement of co-activators and/or effectors can be antagonistic or synergistic, as described above.

The combinations of various individual components of the transmitter-receptor systems allow for a wide and flexible stepwise complexity that is suited to the strict needs of signaling pathways required to act in an orderly and coordinated fashion.

The paradigm of CCN proteins

Biological evolution stands on the inter-connection of multiple biological pathways governing up to the highest organizational signaling and communication levels that must be sequentially turned on and off by a tightly controlled machinery in order to achieve the impressive chain of events leading to the development of a complex organism from a unique zygote. The CCN family of biological regulators is believed to play a master control within such a series of intertwined events. The CCN acronym stands for « cellular communication network factor 1–6» (Perbal et al. 2018, Table 1),Footnote 4 and could very well encompass members of other signaling families that might functionally interact with the original CCN proteins, as discussed hereinafter. New members are expected to join the family as recently reported by Garrett et al. (2023).

Table 1 CCN proteins new nomenclature with previous names and localisation on human chromosomes

Soon after the discovery of the three first members of the CCN family of proteins, it became obvious that they were playing a very wide variety of functions in the realms of normal cellular signaling at all biological levels (for example, cell growth, differentiation, proliferation, extracellular matrix remodeling, skeletal development, chondrogenesis, angiogenesis, wound repair) and pathological levels (fibrogenesis, diabetes, retinopathy, cancer development and metastasis).

Apart from the absence of the CCN typical C-terminal module in one member of the family (CCN5), the CCN proteins show a striking tetramodular organization, with a highly conserved primary sequence (30–50% identity with 40–60% similarity) (Holbourn et al. 2009). They contain 38 cysteine residues that are strictly conserved both in position and numberFootnote 5 (Fig. 4). The specific features of the CCN family members go beyond their organization in four recognizable structural modules. The identification, in each of the four constitutive CCN structural modules (IGFBP, VWC, TSP, CT), of sequence blocs, also present in several proteins belonging to major classes of biological regulators, has attributed to the CCN proteins putative biological functions (Lau and Lam 1999, Brigstock 1999, Perbal 2001). The structural modules were later shown to interact with a significant number of diverse ligands and partners also acting in a large number of signaling pathways (see below).

Fig. 4
figure 4

Conserved Primary structure of the CCN proteins. The figure show the conservation of the four domains. Blue: IGFBP domain; Orange: VWC domain, Purple: TSP1 domain, Yellow: CT domain

Briefly, the CCN family appeared to be functionally “bipartite”, with members acting negatively and/or positively on cell proliferation and differentiation, with evidence that their expression is dependent upon their site and time of expression (Joliot et al. 1992; Rittié et al. 2011). From these considerations stemmed the concept of CCN biology based on spatiotemporal combinatorial events (Perbal 2001).

Thus, the interactions between the CCN various modules and their cognate ligands can be functionally compared to the targets-substrate models developed above.

Phylogenetic combinations of functional targets

“Estimating rates of speciation and extinction, and understanding how and why they vary over evolutionary time, geographical space and species groups, is a key to understanding how ecological and evolutionary processes generate biological diversity” (Morlon 2014).

As stated by H. Morlon, phylogenetic approaches also present new perspectives to their use in the science of interaction networks. Indeed, recent phylogenic in silico tracking of CCN protein modules among a variety of different species (Hu et al. 2019) revealed a few challenging insights to be considered as essential milestones in our studies of the CCN biology system.

Different combinations of individual CCN exons resulting from different shuffling of exons were identified in the genome of invertebrates (Fig. 5), while all vertebrates showed a striking phylogenetic conservation in their CCN proteins primary sequences (Fig. 6).Footnote 6 Thus the amino-acid sequence of NOV/CCN3 (chicken), FISP12/CCN2 (murine), CEF10/CCN1 (chicken), showed a high degree of interspecies organizational structure conservation which led us to propose for the first time that “these proteins are likely to constitute a new family of secreted proteins” (Joliot et al. 1992), later designated CCN by Bork (1993).

Fig. 5
figure 5

Evolutionary relationships of CCN domains in invertebrates and vertebrates. Organization of CCN domains among selected invertebrates (*). B. Floridae (Florida lancelet, Branchiostoma floridae), C intestinalis (Ascidian, Ciona intestinalis), and D. melanogaster (Fruit fly), as compared to the chordates’ organization of the same modules in the human (Hu) CCN family of proteins, and in zebrafish (D. rerio) (#)

Fig. 6
figure 6

Phylogenetic conservation of the CCN proteins organization

While the vertebrates VWC and TSP1 modules are encoded by conserved exons of 252 and 212 nucleotides long, their IGFBP and CT domains show more evolutionary divergence suggesting that they were requested for the switch from arthropods to chordates in which they play essential functions (Hu et al. 2019).

The strict conservation of CCN modules genomic organization among vertebrates, as compared to the flexible situation encountered among invertebrates, shed new light on their potential biological roles and the evolutionary selective pressure exerted leading to the transition between the different phyla. At the time that vertebrates arrived on the scene, some 500 millions years ago, most invertebrates had been flourishing after appearing, at least 600 millions years ago, well back into Precambrian times (Wilson 1987).

The emergence and evolutionary conservation of an integrated architecture in which the various CCN modules are now part of a single polypeptide chain having sustained considerable selective pressure, are strong evidence in favor of functional advantages associated with this organization.

Several examples of truncated or rearranged CCN proteins were detected both in normal and pathological conditions (Fig. 7, Ball et al. 1998; Perbal 1999, 2004; Planque and Perbal 2003; Subramanian et al.  2008; Lazar et al. 2007; Perbal 2009). It would be interesting to establish whether the combinations of CCN modules represented in these variants can be functionally related to ancient CCN-related proteins.

Fig. 7
figure 7

CCN domains of variant proteins expressed in tumors

Assembly: a common theme in evolution

The reunion in a compact multi-functional structure of independent CCN operational blocks involved in an interrelated chain of reactions is reminiscent of the situation encountered during the de novo pyrimidine biosynthesis pathway evolution (Fig. 8).

Fig. 8
figure 8

Evolution of the pyrimidine biosynthetic pathway from bacteria to eukaryotes. In bacteria, the de novo pyrimidine biosynthetic pathway involves the sequential participation of six independent steps. The first reaction catalyzed by carbamoyl phosphate synthetase II (CPSII, encoded by the pyrA gene) produces carbamoyl phosphate (CP) phosphate by utilizing the amide form of glutamine (glutamate) and HCO3-(carbonic acid) [HCO3 (Carbonic acid) + Glutamine (Gln) → Carbamoyl Phosphate + 2 AMP]. In the second reaction the condensation of CP with aspartate to form Carbamoyl Aspartate (CASP)—is catalyzed by the Aspartate Transcarbamylase (ATCase, encoded by the pyB gene) [Carbamoyl Phosphate + Aspartate → Carbamoyl Aspartate + H3PO4]. In a third reaction, catalyzed by Dihydroorotase the CASP is converted into dihydroorotate (DHO), by the enzyme Dihydroorotase (encode by the pyrC gene) [Carbamoyl Aspartate → Dihydro Orotate + H20]. The dihydroorotate is then irreversibly oxidized by the Dihydroorotate Desydrogenase (encoded by the pyrD gene) into Orotate [Dihydro Orotate + Quinone → Orotate + Reduced Quinone]. In the next step the Orotate phosphoribosyltransferase (OPRTase encoded by the pyrE gene) catalyzes the addition of the ribose-phosphate moiety on the orotate, to form orotidine 5’-monophosphate (OMP) [Orotate + PRPP → OMP + Ppi]. The decarboxylation of OMP into UMP (uridine monophosphate) is catalyzed by the OMP decarboxylase, (encoded by the pyrF gene) [OMP → UMP + CO2]. In mammals, the first three steps are performed by a large protein complex (CAD) harboring the three enzymatic activities leading to the formation of DHO, the fourth step requires the activity of a separate dihydroorotate deshydrogenase, and the last two steps in the biosynthesis of UDP are under the control of a two-enzyme complex designated UMPS (see text for details)

Studies of de novo pyrimidine nucleotide pathways in various bacteria and fungi showed that the biosynthetic steps are achieved by a series of enzymes encoded by physically separate genes. In yeast, all the biochemical steps have been found to be the same as in bacteria, except for the first one, in which the two enzymes, carbamoyl phosphate synthetase and aspartate transcarbamylase, form asingle enzymatic complex, encoded by a unique genetic region (ura2) being submitted to the same feedback control as in E. Coli (Lacroute 1968). The emergence of yeast million years ago can be considered as a first step in the unicellular microorganisms evolution to mammalians which are believed to have appeared 66 million years ago after an asteroid impacted the earth.

Quite interestingly, the de novo pyrimidine nucleotide pathway of mammals shows a spectacular evolutive integration of the three first bacterial enzymatic activities pathway within CAD, a 1.5 Mda trifunctional particle formed by the hexameric association of a 250 kDa protein divided into different enzymatic domains, each catalyzing one of the initial reactions for de novo biosynthesis of pyrimidine nucleotides: glutaminase-dependent Carbamoyl phosphate synthetase, Aspartate transcarbamoylase, and Dihydroorotase (Del Caño-Ochoa and Ramon-Maiques 2021). The last two enzymatic steps are catalyzed by UMP synthase (UMPS) harboring both enzymatic activities for the penultimate and the last step of pyrimidine nucleotide biosynthesis on a single polypeptide chain (Wittmann et al. 2008).

The phylogenetic reunion on one single polypeptide of the CCN modules, resulting from exon shuffling originally localized on different chromosomes, might serve the same purpose as grou** individual genes of the pyrimidine pathway encoding proteins having complementary roles in a chain of biological functions.

This situation is highly suggestive of the four CCN binding sites physical liaison permitting a topographical synchronization of a regulatory activities cascade occurring either sequentially or simultaneously. Since the classical prokaryotic operon does not exist in eukaryotes, the expression functionally related genes transcribed into a unique spliced mRNA message would depend on a common set of regulatory sequences and factors. It is therefore highly probable that the logic for the strict evolutionary conservation of the CCN modules organization stems from the critical signaling functions of the CCN proteins. The absence of the CT module in CCN5 may constitute an additional potential regulatory tool of the CCN proteins biological activities.

Ligands binding to CCN proteins: the first interaction level

In a mechanistic model, each module can theoretically interact with high or low affinity ligands, either in a CCN type-specific way or similarly for all CCN proteins, even though the structural environment of the binding sites differ from one CCN to the other. The state of occupancy governs simple or combinations of biological responses. The absence of the CT module in CCN5 may interfere with the capacity of other CCN proteins to interact with their ligands. The schematic model of Fig. 9 represents a simplified partial interactome of CCN proteins domains established from data gathered in several published reviews and a large number of original works (see for example the recent reviews by Zaykov and Chaqour 2021; Lau 2016; Takigawa 2017). The probability of sites occupancy is expected to be an important determinant of the signaling pattern(s) elicited by the various combinatorial events whose frequency will be dependent upon flexible biological environments (see legend Fig. 9).

Fig. 9
figure 9

Schematic partial CCN interactome. The 4 CCN binding sites (1, 2, 3, 4), can interact with any ligand of the four groups listed L1, L2, L3, L4 in an independent way. Each group contain a different number of potential ligands designated l1, l2, l3, and l4. A situation in which all binding possibilities could occur independently at the same time in the same location, would lead to the following statistical analysis. For combination to occur on one of the four binding sites, the probability to choose a potential interacting ligand is determined by the number of ligands (l items), contained in the corresponding list (L). The number of combinations must also take into account the case where no interaction occurs with any of the ligands present in the considered group (L). The total number of liaison possibilities on the four sites can be written as: (l1 + 1) × (l2 + 1) × (l3 + 1) × (l4 + 1). In the present case, (l4 = 4 + 1), (l2 = 6 + 1), (l3 = 10 + 1), and (4 = 22 + 1), with a number of possibilities being 5 × 7 × 11 × 23 = 8855. In cases of ligands acting as potential competitors or activators (antagonism or synergy) for binding on a same site, the combinations are significantly modified, and  in the case of CCN5 the theoretical number would drop to 5 × 7 × 11 = 385, assuming that the binding to the three other CCN sites (1, 2, 3) be identical for the four proteins. It is important to keep in mind that this type of estimation result from a compilation of interactions identified in various experimental conditions and many different biological conditions, including the origin and state of the CCN proteins considered, the temporal and spatial bioavailability of substrates and targets. It is unlikely to represent the real situations. To use a metaphor, the numerous ligands identified could be compared to piles of bricks and building materials deposited on a closed site, waiting to be assembled in different ways to give six houses, each with their own functional specificity

Not only is the ligand variety is striking, it also indicates that the CCN proteins are at the center of a vast set of partners previously identified as key regulatory factors in signaling pathways that are essential for the balanced functioning required in biological life, from birth to death. Interestingly, some partners were shown to bind sequences belonging to distinct structural modules, raising the need, not addressed in the present article, for a three dimensional approach based on the use of spatial biology technologies platforms.

It is obvious, that all these interactions are dependent upon the dimensional accessibility of the targets and are unlikely to proceed at the same time, and at the same place. A conclusion that is reinforcing our spatiotemporal combinatorial model, in which we have proposed that the biological properties of CCN proteins depend upon combinatorial events, is based on the spatial and temporal bioavailability of both the CCN proteins and their cognate partner (Perbal 2001, 2013, 2018, 2019).

The different graphical representations (linear or circular) that we use to represent the ligands interactions with CCN modules, may convey the misleading impression that the CT module of any CCN protein may interact at any given moment with 22 different ligands while the others may only interact with 4, 6 and 10. Most of these proteins were reported to physically interact with the various domains of CCNs. The experimental procedures included immunochemical approaches, two-hybrid systems and plasmon resonance.Footnote 7 In their recent comprehensive structure–function review of the CCN2 interactome Zaykov and Chaqour (2021) also point out the fact that several ligands appear to bind the same site on CCN2 and suggest that the “effects” of CCN2 are context dependent, another conclusion in support of our model.

This model is reinforced by the collection of data that we have gathered in the scientific literature about the identification of CCN proteins ligands. Indeed, many different partners have been shown to bind the four individual modules. Even though most of the data published relate to CCN1 and CCN2, other proteins in the family also bind common ligands and to our knowledge there is no evidence of experimental data comparing the occupancy of the binding sites for different CCN family members. This compilation is unlikely to represent the state of CCN sites business at any given time for any of the individual member of the family, but rather pleads for a mixture of the many situations resulting from the sequential interactions of substrate and targets at different times and places.

Furthermore, there is no study tackling potential competitive or synergistic aspects of ligands interactions at the same sites, either in a unique or different CCN models.

The binding kinetics and relative affinities of the CCN partners to their target is another critical aspect to study in approaching the roles of CCN proteins in the realm of cell signaling.

Prior to questioning the biological significance of evidence for physical interactions and before addressing their involvement in the wide cross-talk of signaling pathways, it is appropriate to ask how these observations can be interpreted in the frame of mechanistic association probabilities discussed above. The relevance of these interactions needs to be assessed at different levels, including the pertinence of the experimental procedures, and the type of bindings should also be considered in order to determine whether they are exclusive, cooperative, or conditional.

In any case, it appears that the field has reached a point where the binary views of CCN proteins interactions involving a limited specific set of targets that allowed the identification of the processes foundations, is obsolete. Future approaches should be based on wider approaches, taking into consideration both the interactions of CCN proteins between themselves and their biological interactions with biological superfamilies sharing common structural determinants.

CCN proteins as ligands to cellular targets: the second interaction level

In order to gain a more comprehensive picture of the roles of CCN proteins in the regulation of cellular signaling, it is important to keep in mind that the heretofore bilateral combinations, are amplified and complicated by the fact that the CCN family of proteins comprises six members acting individually or as a whole and in conjunction with other proteins, including themselves (Fig. 10).

Fig. 10
figure 10

Schematic representation of potential binding sites for CCN proteins on a target cell. This schematic representation of six potential different sites of liaison for the CCN proteins onto their target cells illustrates the unlikely case where the six proteins would have a similar binding capacity and be at the same time, and same space in presence of all ligands. Considering N1, N2…N5 the numbers of combinations drawn in Fig. 9 for CCN1, 2, 3, 4 6, the total number of possibilities would be evaluated to (N1 + 1) × (N2 + 1) × (N3 + 1) × (N4 + 1) × (N6 + 1) = 8855^5 = 5 × 10^19. The addition of CCN5 in the calculation would increase the number of combinations to a total of 8855^5 × (N5 + 1 = 385) = 2 × 10^22. These numbers would considerably decrease if, CCN proteins and their ligands are not expressed simultaneously in all tissues at all developemental times, as considered in the case of spatiotemporal mode of regulation, and if competitive of synergistic interactions occur

In the absence of data that could have emerged from experimental comparative assessment of affinities and binding kinetics, one can only consider the factual observations that were accumulated over the past two decades regarding the variety of proteins expressions levels and their subcellular localization.

The results obtained very soon after the CCN proteins identification revealed a large array of expression sites, both in normal and pathological conditions. The secreted CCN proteins are detected in the extracellular matrix and/or bound to the cell membrane (Perbal 2001). They are also detected in the cytoplasm of cells in culture and in tissues by immunofluorescence. Depending upon the proliferation/differentiation stages nuclear full length and/or truncated CCN proteins are also detected (Perbal 1999; Su et al. (2001); Vallacchi et al. 2008). The observations collected during the characterization of the founding CCN protein expression patterns, established that the amounts of RNA species and proteins that they encode were not always quantitatively matching. Many examples pointed unbalanced amounts of CCN RNAs and proteins, with some tissues expressing high levels of RNAs with low amounts of proteins and vice versa. These apparent discrepancies were later shown to result from specific post-transcriptional processing and post-translational mechanisms altering the fate of the proteins (Bleau et al. 2007). Secondly the analysis of CCNs expression patterns ex vivo and in vivo confirmed that their biosynthesis rates were affected by their environment and the nature of the various tissues in which they were detected. Thirdly, the spatiotemporal availability of their partners expected to modulate their binding capacity to recipient cells, is a critical factor to take into account.

Based on the variety of factors involved in the differential expression and ability to interact with their cell targets, several combinations must be considered. Starting from the simplest case in which the six CCN proteins would be present at the same time in the same tissue, several other cases are made possible by the variety of substrate/targets associations.

Consequently, it is important to question the relevance of approaches chosen to address the functional aspects of CCN protein signaling.

From reductionism to interactions networks: the third interaction level

One can argue that the most complex phenomena rely on chains of reactions involving physical or chemical elementary effectors interacting in coordinated ways leading to the final aimed goals. However, the reductionist claim that a precise knowledge of each system component is sufficient to understand its functioning, “is not applicable to biological systems either because of the existence of new, unpredictable (emergent) properties at each level of evolutionary integration” (Rull 2012). According to the concept of emergence, the organization processes create entities whose properties exist only as a result of the interactive arrangement of their components.

The reductionist-type of approaches used over the past two decades focusing on the fine CCN modules structural organization brought a significant amount of useful basic knowledge. However, they are no longer appropriate to address the functions of the CCN family of proteins in the light of their possible integration as a wide biological system that requires to be studied as such.

To use a metaphor, the production of a car requires the assembly of about 30 000 parts. Among them, the frame, the engine, the steering column, the brake system are four essentials elements common to all brands and types of cars. From the very early ages of automobile, these parts were evolving but are still playing a similar function in all different models. Disassembling any one of these basic components would certainly be instructive but would not reveal how they individually participate to the whole look and performance of the vehicle. Only a comprehensive approach would lead to the understanding of its functional aspects.

On a similar ground, a limitation to sophisticated technologies aiming to decipher the ligand binding in a 3D structure of each isolated CCN domain, would leave wide open questions regarding the physical and functional behavior of these proteins ex vivo and in vivo.

Hence the need for approaches resting on broader analytical processes.

As we have already advocated (Perbal 2023), the advent of spatial biology combined to Wide Genome Association studies, has proved very helpful along deciphering the genetics of cancer. Invoking similar approaches would be very helpful to decipher the interplay of multiple biological signaling pathways and determine more precisely the place of the CCN proteins in this system.

The case of CCN5 in the interactions network: the cherry on the cake

The structural distinctiveness of CCN5 has been the matter of several reviews. None of them, to our knowledge, addressed the possibility that the lack of a terminal CT domain might confer on this protein a central mechanistic role in the coordination of signaling pathways.

Briefly, the CT module has been recognized to be a major player in the formation of dimers in several superfamilies of proteins, acting as the first dispensable step in the process of the multimerization (Fig. 11).

Fig. 11
figure 11

Cysteine knot as a tool for dimerization in superfamilies of proteins

Based on the evidence showing that CCN proteins can physically and functionally interact, simple models can be proposed for a potential regulatory role of CCN5.

  1. (1)

    The absence of a CT module in CCN5 signs an evolution of this protein toward a situation in which the specific functions of the C-terminal sequences are being lost or combined within the three other modules. Considering that five of the six other members show a high conservation of the CT module, the evolutionary elimination of CT sequences seems unlikely. However, it is important to consider that what seems an improvement for some species might turn as a substantial diminishment for others, as recently reported for skeletal changes that allowed humans to walk making them vulnerable to knee osteoarthritis (Richard et al. 2020)

  2. (2)

    Alternatively, the CCN5 gene could be evolutionary “late”, without the gain of the CT exon having occurred as yet. This situation cannot be excluded but is unlikely in the light of the other CCN proteins conservation. In any case more time would be needed to confirm an evolution in this direction.

  3. (3)

    The possibility that we favor is based on the pieces of evidence suggesting that the CT domain favors the dimerization of CCN proteins. Based of its unique structural feature, we propose that the CCN5 protein might be a key regulator of the CCN protein System.

The cystine knot present in several classes of proteins superfamilies sharing partial structural identities with the CT CCN modules, governs with the assistance of neighboring folds, the combination of two monomers to produce active dimers. These families include, for example and not exclusively, the Insulin-like growth factor-binding protein (IGFBP) (Hwa et al. 1999), the Nerve Growth Factor (NGF) (Aloe et al. 2012), the Transforming Growth Factor (TGF) beta (Herpin et al. 2004), the platelet derived growth factor (PDGF) (Meng et al. 2016), the gonadotropins superfamilies (Lunenfeld 2004) and the vascular endothelial growth factor subfamily (Holmes and Zachary 2005). New members are still expanding the size of the family (Vitt et al. 2001).

In the hypothesis of the CCN proteins CT and VWC domains being involved in homo-and hetero-dimerization, cross-interactions between these superfamilies with the CCN family could extend their scope of regulatory functions (Figs. 12, 13). The unique tetramodular of the CCN proteins would then place them at the crossroad of a wide array of signaling pathways and provide the ground for a functional coordination of major biological systems, including for example embryology, reproduction, nervous system, metabolism, hormonal and immunological responses, etc.

Fig. 12
figure 12

Representation of potential heterodimers formation between CCN proteins and non CCN proteins containing a CT domain. CCN protein heterodimers can occur via the VWC and/or the CT domains. Heterodimerization of CCN proteins with CT containing members of protein superfamilies would considerably widen the regulatory potentials of CCN proteins

Fig. 13
figure 13

Combinations of large superfamilies members with CCN family members increasing potential binding sites on target cells. This scheme illustrates the type of potential increase in binding sites of combinations of heterodimers on target cells. Different types of Interactions between CCN proteins other superfamilies of regulators could be made available at the same time and locations and increase the number and variety of target cells

The lack of CT domain could avoid the dimerization process to proceed between proteins that contain a CCN-like CT domain and other Cystine Knot proteins. It would also be of interest to consider the occurrence of multimerization driven by the VWC module. In both cases, the lack of CT is expected to alter the biological properties of the dimers.

This situation is reminiscent of the basic helix-loop-helix (bHLH) superfamily of DNA-binding dimeric transcription factors that regulate critical developmental process (Roshger and Cabrele  2017) ThebHLH proteins contains two highly conserved functionally distinct domains making a 60 amino-acid residue region whose N-terminal end is the basic domain that binds to the DNA target. At the C-terminal end of that region is the HLH domain which helps the interaction with other proteins and formation of homo-and hetero-dimeric structures. Phylogenetic studies have also identified a leucine zipper domain hel** the formation of dimers (Fig. 14-1).

Fig. 14
figure 14

The model of bHLH transcription regulators transposed to the case of CCN5. Panel 1: Binding of the Id bHLH monomer to a class 1 bHLH monomer leads to an inactive heterodimer. Panel 2: The physical combination of CCNX and CCNY monomers via their CT module leads to an active heterodimer. The binding of CCN5 monomer lacking a CT domain avoids the formation of a CCNX/CCNY heterodimer via the CT domain. The system is frozen until displacement of the dimer is induced by a CT containing monomer

The constant theme of several subdomains involved in the efficient dimerization of the regulator factor to its targets is evocative of the situation encountered in CCN proteins where at least two domains (CT and VWC) have been involved, for example, in the heterotypic interactions between CCN2 and CCN3.

Interestingly, HLH proteins known as Id 1–4 are deprived of the DNA binding domain. They have the capacity to confine bHLH proteins in dimers devoid of DNA binding activity, thereby acting as negative regulators of bHLH mediated transcription. The Id proteins are submitted to proteasome-mediated degradation which allows the inhibitory action to be regulated.

The striking physical similarity between the CCN5/CCN and Id/bHLH couples suggest that it might also be functional. Although a physical interaction between CCN5 and other CCN proteins remains to be established, the binding of CCN5 to other CCN proteins might sequester them into a “frozen” dimeric state having lost part of the full length CCN proteins biological activities (Fig. 14-2).

Alternatively, and in contrast to the situation of Id-driven transcription inhibition, the assembly of a CCN5:CCNx might unlock a new set of activities specific to this type of dimer.

Discussion: the CCN system as a tool for coordinated multi-signaling axis

The hallmarks of the CCN proteins have been widely reviewed. Discovered in the late 1990s, the CCN proteins remain a “functional mystery”. Indeed, in spite of the fact that they exhibit a myriad of biological functions, they have not been assigned a clear common biochemical mechanism of action.

One of the reasons for this “gap” lies certainly in the fact that a vast majority of the studies aimed at understanding the CCN proteins functions, relied on “reductionist” structure–function sophisticated approaches seeking to identify the elementary participation of each module sub-portions.

We do recognize that the most complex biological phenomena rely on chains of reactions involving physical or chemical elementary effectors interacting in coordinated ways. However, a comparative study deciphering the detailed three dimensional structures of the constitutive CCN modules can be informative. As we have tried to advocate, the very unique organization of the CCN proteins should trigger a strong incentive for broad spatial approaches based on comprehensive, consistent and coordinated types of approaches taking advantages of the strong basic knowledge that has been accumulated in the CCN and of related signaling system biochemistry.

By tackling the cell signaling and communication issues from a point of view inspired from basic enzymology, we have attempted to address from a mechanistic angle each of the steps involved in CCN proteins signaling, from transmission to interpretation, in various contexts of cell biology. We have distinguished three successive levels of target-ligand interactions. Although each of them is intellectually productive, they finally provide a picture that partly accounts for the vast and wide array of biological regulatory functions assigned to the CCN family of proteins.

The opening might come from studies aimed at identifying the physical and functional interactions of the CCN proteins and their consequences on the binding of ligands to the four CCN domains. The present bulk of data does not allow to distinguish between situations resulting from multiple competitive or synergistic combinations occurring at the same sites, and complex binding profiles resulting from different affinities of the CCN domains embedded in various CCN proteins.

Simple questions, such as addressing the capacity of identified ligands to bind the same module in the various CCN proteins, have been left aside because of academic or commercial expectations. In some cases, the “publish or perish” context led to an accumulation of data that are not even cross-checked.

Early observations indicated that variant CCN proteins are detected both in normal and pathological conditions (Brigstock et al. 1997; Perbal 2001,  2004, Lazar et al.2007). The existence of these forms has not been considered in most of the studies aiming at establishing the roles of CCN proteins which only focused on full length proteins, even in cases where posttranslational processing was demonstrated to generate active polypeptide forms (Bleau et al. 2007) and lead to active forms from preproteins (Kaasbøll et al. 2018).

The co-existence of biologically active full length and truncated or rearranged forms of CCN proteins is expected to be of great importance.

A major question that has not drawn the interest that it deserves needs to be widely addressed to embrace the power and meaning of the CCN proteins interactions. It has been reported for many years that the expression levels of CCN3 significantly vary in vivo with the timing of development in the various organs considered (Joliot et al. 1992). In embryonic chicken tissues, CCN3 expression measured in identical quantities of samples, was highest in brain, and heart, with low levels in muscle and intestine. On the contrary, in the adult tissues, while the expression of CCN3 in brain was still high, the major expression levels were detected in lungs while no expression was measured in heart, muscle and liver (Fig. 15). These data which undoubtedly meant that the expression of CCN3 depends upon the timing of development and the organs considered, formed the basis of the spatiotemporal regulation of transcription during development.

Fig. 15
figure 15

Differential spatiotemporal expression of CCN3 during embryo development to adult. Samples from various tissues and time of development containing normalized mounts of RNA were separated on denaturing agarose gels and northern blotted prior to incubation with a CCN3-specific probe

We consider that this conclusion is one of the keys to the understanding of the wide array of CCN biological functions reviewed elsewhere.

Our above considerations also led us to propose that the absence of CT domain in CCN5 might in fact be of prime importance in the regulation of signaling.

First of all, the existence of superfamilies of proteins sharing sequences homologous to the CT and VWC motifs of CCN family members strongly suggest that inter-families homo-and hetero-dimeric combinations occur and play active role(s) in the regulation of signaling. The potential heterodimerization of CCN proteins with members of other superfamilies may represent a link to various signaling pathways which need to be coordinated. The intervention of CT-deleted CCN5 could be a potential key regulator for the modulation of pathways in which dimeric and multimeric complexes are required. Along this line it is important to keep in mind that the active Von Willebrand factor is a multimeric association of monomers (Furlan 1996). Only the largest multimers are hemostatically active. The absence of high molecular weight multimers is associated with bleeding tendency, while the presence of multimers of supranormal size is associated with increased risk of thrombosis. Hence, there is the importance of the mechanisms underlying the formation of the proper amount and size of multimers.

Studying interactions of individual CCN proteins modules with their targets is unlikely to provide clues about the spatial interactions of CCN proteins with other superfamilies and their ligands.

The power of multifactor signaling required to the coordination of developmental interdependent pathways comes from the high level of combinatorial events occurring as the result of ligands and bioavailability of targets in different tissues over time. The CCN proteins fulfill all the requirements for the regulated coordination of signaling pathways involving the sequential action of biofactors on their targets. We are therefore confident that spatial biology will provide spectacular information about the way CCN proteins can functionally coordinate the spatiotemporal regulation of major signaling pathways governing normal and pathological development.

As a first step to a global identification of CCN proteins interactions with themselves and their partners, functional association networks predicted by the STRING database consortium were examined for each of the six members of the family (Fig. 16). Although it is not the focus of this review to elaborate on maps drawn from such studies, it is tempting to compare the nature of the predicted partners in the first shell associations, with the ligands identified in experimental contexts which are not representing the biological realm in which these regulators are acting. Indeed, aside from of a few of them, most ligands are not identified by both approaches.

Fig. 16
figure 16

(Upper part) Potential functional interactions of CCN proteins drawn from STRING database. According to the STRING website, “The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases”. The prediction shown is based on the first shell interactors directly associated with the input protein used for search. The figure shows the first shell level of proteins directly associated with the proteins of interest. Proteins identified at the second shell level, are reported as usually showing weaker associations. Interactions according to the STRING prediction, “2nd shell of interactors are the proteins associated with the proteins from the 1st shell or with the input protein(s). It can happen that a 2nd shell protein can be directly connected to the input protein(s), but it will usually have a weaker association and therefore it would not show up among the specified number of the 1st shell interactors”. (Lower part) Predicted STRING protein associations of CCN proteins. The identification of potential associations between CCN proteins and other proteins was established through several methods. The 1, 2, 3, 4 values provide the type of support being used to establish the relationship between proteins. It is interesting to note here that several proteins are reported to interact with more than on CCN protein. For example the transcriptional co-regulator YAP1 was reported interacting with CCN1 and CCN2. Although it was not found to interact with CCN3, this observation might be related to the nuclear localisation of CCN3 which was reported to occur in proliferative normal and tumor cells (Perbal 1999); POSTN [(the periostin extracellular protein which is required for the matricellular localization of CCN3 (Takayama 2016)] found to interact with both CCN3 and CCN4, and CTNNB1 (beta catenin involved regulation of cell-cell interaction) for CCN4 and CCN5

The results obtained in addition to the combinations opportunities previously calculated from results of classical interaction analyses, raised interesting considerations as to the number of combinations made possible in such a multiplayer system (Fig. 17).

Fig. 17
figure 17

Potential coordinator role for CCN proteins in complex signaling pathways. The scheme summarize the various interactions of extracellular matrix, and membrane CCN proteins which can interact with their different binding sites to proteins responsible to binding to coreceptors involved both in the negative or positive regulation of signaling pathways. It is also shown that the CCN proteins can interact with themselves and can transduce signals within the cells where they can cross talk with internal signaling pathways. The central position of CCN proteins in the coordination of various complex signaling pathways accounts for its wide array of interconnections with many biological regulators, both in normal and pathological conditions

At this time, we have only considered the CCN concertos in which each of the six instrumentalists engage into specific sets of dialogues with the orchestra. All instrumentalists are present at the same time in the same room. We know that it is only a part of the puzzle, because of the spatiotemporal regulation to which they are submitted.

Rehearsals can provide clues about what happens when some of them are not present, or able to perform. The next stage will consist in identifying a conductor responsible for the coordination of all the regulatory factors performing together at various timings in the Fantastic Life Signaling Symphony.