1 Introduction

Human collective behavior has been increasingly studied due to an unprecedented amount of data available from the digital world [1]. A new research topic has been thus opened to an extensive use of multidisciplinary strategies, that are aimed to dive into the empirics by using a wide variety of styles and techniques. Approaches in the literature to find dynamical patterns in data or even address fundamental research questions are today rich and diverse. Still, one of the most intriguing aspects that needs further understanding is the non-trivial relationship between individual actions and the aggregated bulk of actions of large collectivities [2].

Rather evident contexts where it is possible to study the phenomena are social networks. It is possible to observe coordination effects, amplifying for instance the impact of a street protest in a microblogging platform such as Twitter [3]. The links through which information flows can bring out macroscopic emergent patterns. However, other situations differ from this perspective, and then allow us to neatly focus on how the macroscopic signal leads to individual actions simply because there is no direct communication among the individuals. This can also be considered the case of our dataset containing clients’ activity from a trading firm whose orders have no significant impact in asset price evolution.

Within the study of collective behavior in financial markets, there are several lines of research [4, 5]: from computational agent-based models aiming to better understand phenomena such as herding behavior [6,7,8,9] to pure empirical analysis on investor’s activity [10] or eventually through data-driven models [11]. Some of these studies focus on the bursty trading activity data [12, 13], and the impact of external information flows on price and then provide new indicators to measure the degree to which a particular news item attracts attention from investors [14]. Price shifts due to trading activity and order book imbalances are being studied observing universal patterns that link macroscopic price formation and individual market and limit orders placed in the order book [15, 16]. Tick-by-tick trading activity indeed describes a multifractal behavior explained by a highly heterogeneous nature of executed tasks, mostly due to the large diversity of investor’s profiles [12, 13]. The marked peaks of trading activity and the clusters with very intense activity emerging between calm periods are also observed to be linked with the bursty evolution of market volatility [17, 18] which is a very relevant indicator in traders decisions mechanisms. The investor’s behavior is also behind the interpretation of the non-trivial market phenomena, such as the leverage effect where daily price drops increase volatility of the following few days [19, 20].

The non-stationary nature of the financial series, together with the fact that investors are heterogeneous, meaning for instance that they operate at different volume scales and time-horizons, asks for a careful analysis and the application of the most adequate techniques. It is precisely under this context where non-parametric statistics deploys all its powerful methods. Thus, our analysis is mostly grounded on Mutual Information [21] and Symbolic Transfer of Entropy [22] (STE), which allows to quantitatively study individual behavioral aspects, like synchronization and information flows, key elements to identify higher properties like structural hubs, coordinated communities, critical transitions or sudden collapses [23]. STE analysis is in fact a rather new tool in the context of financial markets, which has mostly being used to analyze cross-market effects [24, 25] and to identify dynamic causal linkages as a way to complement other techniques such as network analysis [26, 27], which might have important consequences in optimizing portfolio composition. In this sense, Mutual Information and mostly STE respectively represent an alternative approach to statistically validated synchronous networks [28] and its much more recent evolution under the form of statistically validated lead-lag networks [29, 34]. A subsequent study by the same author [35] was also analyzing return patterns and investor’s purchases finding that overall trading for a particular group of investors is excessive. In the 2000’s Grinblatt and Keloharju took advantage of transparent Finnish stock market, where traders’ IDs are recorded in every transaction, and used a database of this market to analyze the performance of different types of traders, categorized as pro-momentum or contrarians in a first study [36]; and sensation seekers or overconfident traders subsequently [37]. Other efforts [11] were made with clients database from one of the greatest on-line Swiss broker which found empirical relationships between turnovers (contrarian strategies) account values and number of assets in which a trader is investing.

Tumminello et al. [31] made a first attempt in 2012 to identify clusters of investors in the Finnish market with statistically validated synchronous networks [28] and this effort has also served to go deeper in trading profiles identification [38]. More recently same methods have been applied to the clusters of investors with similar trading profiles in a robust and reliable way understand their long-term ecology based on what Musciotto et al. call adaptive market hypothesis [33] or even to study systemic risk [32]. Other recent works explore the possibility to find trading similarities of Swedish investors with similar portfolios [39], while Lillo et al. [40] have also investigated how news (an exogenous signal) affect the trading behavior of different categories of investors or even how different. Pairwise synchronization between traders’ activity is been used to detect communities and define groups of traders. Recently, Challet et al. [29] infer lead-lag networks to predict the sign of the order flow and the volume weighted average price of broker clients over the next hour. And even more recently Cordi et al. [\(A_{ij}\). (Top-center) Considering only values within the overlap** period we codify the position time series into symbols, using in this particular case with embedding dimension \(m=2\), to generate symbolic time series X and Y. (Top-right) We use these symbolic time series to compute the values for Mutual Information \(I_{ij}\) and Transfer of Entropy \(T_{ij}\). In parallel, we apply a bootstrap** process to X and Y to extract a distribution of null values \(I_{ij}^{\ast }\) and \(T_{ij}^{\ast }\), which we use to apply the FDR procedure explained in “Methods”. Then, we keep all values within the 95% Confidence Interval, manually setting the rest to 0 for the non-significant, to create the adjacency lists for Synchronization Network (Center-right) and Anticipation Network (Bottom-right) for REP market. The networks are generated considering investors as nodes and edge weight as the corresponding values of \(I_{ij}\) and \(T_{ij}\) respectively. Size of the nodes is proportional to the node degree for Synchronization network and to the out-degree for the Anticipation network. Numbers inside each node are used as an ID of the investor