Experiments as Code and its application to VR studies in human-building interaction

Aguilar, Leonel; Gath-Morad, Michal; Grübel, Jascha; Ermatinger, Jasper; Zhao, Hantao; Wehrli, Stefan; Sumner, Robert W.; Zhang, Ce; Helbing, Dirk; Hölscher, Christoph

doi:10.1038/s41598-024-60791-3

Experiments as Code and its application to VR studies in human-building interaction

Article
Open access
Published: 30 April 2024

Volume 14, article number 9883, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Experiments as Code and its application to VR studies in human-building interaction

Download PDF

Leonel Aguilar^1,2^na1,
Michal Gath-Morad ORCID: orcid.org/0000-0001-7673-6290^1,3^na1,
Jascha Grübel ORCID: orcid.org/0000-0002-6428-4685^1,4,5,6,7,8^na1,
Jasper Ermatinger¹,
Hantao Zhao^11,12,
Stefan Wehrli⁹,
Robert W. Sumner⁴,
Ce Zhang²,
Dirk Helbing^9,10 &
…
Christoph Hölscher^1,9

629 Accesses
3 Altmetric
Explore all metrics

Abstract

Experiments as Code (ExaC) is a concept for reproducible, auditable, debuggable, reusable, & scalable experiments. Experiments are a crucial tool to understand Human-Building Interactions (HBI) and build a coherent theory around it. However, a common concern for experiments is their auditability and reproducibility. Experiments are usually designed, provisioned, managed, and analyzed by diverse teams of specialists (e.g., researchers, technicians, engineers) and may require many resources (e.g., cloud infrastructure, specialized equipment). Although researchers strive to document experiments accurately, this process is often lacking. Consequently, it is difficult to reproduce these experiments. Moreover, when it is necessary to create a similar experiment, the “wheel is very often reinvented”. It appears easier to start from scratch than trying to reuse existing work. Thus valuable embedded best practices and previous experiences are lost. In behavioral studies, such as in HBI, this has contributed to the reproducibility crisis. To tackle these challenges, we propose the ExaC paradigm, which not only documents the whole experiment, but additionally provides the automation code to provision, deploy, manage, and analyze the experiment. To this end, we define the ExaC concept, provide a taxonomy for the components of a practical implementation, and provide a proof of concept with an HBI desktop VR experiment that demonstrates the benefits of its “as code” representation, that is, reproducibility, auditability, debuggability, reusability, & scalability.

Building the MSR Tool Kaiaulu: Design Principles and Experiences

Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic

Article Open access 07 June 2024

Towards the Next Generation of Scenario Walkthrough Tools – A Research Preview

Introduction

Human-Building Interaction (HBI) is a field dedicated to understanding how humans and buildings co-exist and influence each other¹. HBI researchers often conduct behavioral experiments in existing buildings to gain insight into this dynamic relationship. However, these experiments face significant challenges, including privacy concerns, implementation difficulties, and lack of compliance².

To overcome these challenges and increase the degree of internal validity, HBI researchers are increasingly turning to Virtual Reality (VR) to conduct behavioral experiments, allowing cost-effective pre-occupancy evaluations^3,4,5,6,7. Behavioral experiments in VR are composed of more elements than just the content of the experiment, ranging from the development of tasks, procedures, information, and documentation. Moreover, VR experiments should also include the experimentation process⁸ that describes how to conduct an experiment rather than purely describing the outcome. Experimentation includes all steps from preregistration to the acquisition of participants and monitoring the status of the infrastructure. Based on best practices, we classify these elements as components in groups and subgroups under a newly proposed six-pillar taxonomy (see Fig. 2). These pillars cover the essential aspects of an experiment by subdividing it into fundamental groups: documentation, infrastructure & environment, data collection, data assembly, data analysis, and management.

Typically, VR behavioral experiments in HBI are inspired by an industry standard procedure for testing user interfaces in computer programs (HCI) with small user studies (4–6 participants)⁹. Whereas user studies in user experience research test a user interface for a computer program with a specific task, HBI requires a more complex setup entailing importing virtual building models into a game engine and exploring it. Ad hoc explorations of the buildings similar to a user study may not suffice to investigate the building as even for user studies the 5-participants rule has long been discredited⁹. Furthermore, since there are no mechanisms to easily reproduce or audit these ad hoc experiments, it is difficult to assess their validity. At the same time, critical and often expensive design decisions are made based on anecdotal findings from these experiments.

This difficulty in validating VR experiments in HBI is closely related to the crisis of scientific reproducibility in experimental behavioral research in general ^10,11,12,13. Many efforts to replicate have failed to reproduce the original findings, raising concerns about the reliability of the research results^10,11,12,13. This reproducibility crisis is exacerbated by the informal sharing of experiment details, often conveyed verbally, which can lead to missing information and hinder replication efforts¹⁴.

Current efforts to achieve reproducibility have improved the situation such that enough information is provided to “understand” the experiment. However, understanding alone may not be sufficient for re-implementing the experiment from scratch. Subtle details may be implemented differently, leading to different outcomes¹⁵. In contrast, Preproducibility¹⁶, or scientific recipes, requires researchers to provide the complete set of elements and procedures necessary to replicate an experiment that “cannot with advantage be omitted”¹⁷. Numerous initiatives across various scientific disciplines aim to improve experimental research, enabling reusability and composability^{18,19,20,21,22,33} for the HBI domain.

The proof of concept provides a code representation of components in the six pillar taxonomy for this experiment, see Fig. 2, that enables the automation of the instantiation of everything necessary for running the experiment and providing documentation for it, see Fig. 5. Several standard technologies, e.g., Docker, Terraform, Python, Golang, jupyter notebooks, MTurk, S3 Object storage are used, details on the implementation of every component can be found in the “Methods” Section. The experiment consists of participants finding their way to different targets within a building. The experiment evaluates the effect of two architectural variations on the wayfinding performance measured in terms of distance walked to find the targets. ‘Treatment A’ adds Atria to the base building and ‘Treatment B’ modifies the staircase shafts with transparent glass, see Fig. 4. The advantages of an ‘as code’ representation of the experiment and its experimentation process are evaluated by instantiating the experiment several times, auditing, debugging, and reusing the experiment components, and by analyzing the user pool that accessed the experiment.

The main contributions of this paper are threefold. First, we contribute towards establishing an ExaC paradigm by providing a definition and demonstrating it with the full experimental pipeline of a desktop VR behavioral HBI experiment. Second, we provide an open source proof of concept implementation of this “Experiment as Code” paradigm, to scaffold data collection in our online desktop VR experiment. This enables third parties to audit and reuse our experimental design and demonstrates the benefits of the ExaC representation. Third, we apply the lens of ExaC to notable VR frameworks and provide a detailed comparison and review of these, emphasizing their capacity and shortcomings to support experiments, experimentation, and Preproducibility.

Concept

Behavioral experiments in VR can be composed of several elements (e.g., tasks, procedures, information, and documentation) beyond the experiment content. They can span from preregistration over the acquisition of participants to the monitoring of the infrastructure status. We collect experimental elements we have observed in our interdisciplinary experience in behavioral experiments in HBI and identify gaps in how the current VR Frameworks support them, see Table 3). We classify these elements as components in groups and subgroups under the proposed six-pillar taxonomy (see Fig. 2). The pillars cover the essential aspects of an experiment by subdividing it into groups: documentation, infrastructure & environment, data collection, data assembly, data analysis, and management. We collect and exemplify these common components required to enable reproducibility. The exemplified components may vary in their classification and extent depending on the details of their implementation and experiment requirements; e.g., sensitive data anonymization could be desirable to occur at collection time. Although some components of each pillar are strictly required to produce an experiment, others are optional but desired to improve reproducibility.

We base our definition of ExaC on the idea of preproducibility¹⁶, but instead of providing scientific recipes we aim at providing automated and digitally documented scientific recipes inspired by the advances in cloud computing. Thus, we define the ExaC paradigm as providing imperative, or declarative automation code and digital documentation to the components of the respective essential pillars.

We borrow the imaging concept from Environments as Code. An image or snapshot of ‘how a run-time environment should look like’ serves as a specification of how actual containers are deployed. The static ExaC codebase serves as the image (see Fig. 1, Base). The whole live experimental setup can be automatically instantiated from the image including the supporting components and the components building the experiment itself. This entails but is not limited to provisioning the right hardware, deploying the software stack, instantiating abstraction interface services, acquiring participants, hardware monitoring services, serving the content to participants, and ultimately collecting the experiment data.

The ExaC paradigm is rooted in three larger developments of the last decade. First, the rise of the field of meta-science³⁴, second are initiatives to overcome the reproducibility crisis, and third are developments in cloud computing such as the “___ as Code” and “___ as a Service” revolutions.

There are several initiatives that tackle challenges impacting directly reproducibility. For instance, there is an indexing service that specializes in indexing papers accompanied by code¹⁸. Another initiative provides multidisciplinary, collaborative, and reproducible data pipelines¹⁹. There are also modular ecosystems to improve machine learning reusability by managing and automating the entire lifecycle of machine learning application development life-cycle with reusable models and components²⁰. Some disciplines have pushed even further, creating domain specific standards²¹ and ontological databases²² that enable more efficient and systematic knowledge extraction. Moreover, publishers are now encouraging the publication of protocols in specialized services such as protocols.io, where protocols are converted to modifiable and reusable checklists.

Tightly interrelated with reproducibility are the notions of auditability, debuggability, reusability, & scalability. We define auditability as providing a transparent and comprehensive view into the experiments’ mechanics, tools, and components. Thereby, any third party is able to evaluate the software³⁵ and by extension the experiment’s reproducibility. We define debuggability as the means to identify and modify aspects of the experiment that affect its reproducibility³⁶. We define reusability as the ability to reuse parts or variations of an existing software to create a new one³⁷ and in our context software extends to experiments. Here we can exploit the benefits of open source software development in science,³⁸. These three are ubiquitous tasks in open-source software development and as such are enabled through the ExaC representation by inheriting the same good practices. Finally, in systems scalability refers to the ability to either expand a system in one dimension (structural) or increase in the amount of activity (load) the system receives³⁹. In the context of experiments, this refers to the ability to increase the sample size either through repetition (horizontal) or the use of larger participant pools (vertical), e.g., crowd-sourced and remote experiments.

In cloud computing, the complexities and requirements of current web services have pushed advances and paradigm shifts such as the “___ as Code” and “___ as a Service” revolutions. Paradigms such as “Infrastructure as Code” (IaC; e.g., Terraform, AWS Cloud formation), “Environments as Code” (e.g., Vagrant, Docker), “Configuration as Code” (e.g., Ansible, Puppet Chef), “Data Pipelines as Code” (e.g., Airflow) are now ubiquitous in web service development. There, a single service requires the deployment of hundreds to thousands of micro-services, their infrastructure to be dynamically provisioned, and their data to be continuously analyzed.

For example, IaC allows storing and versioning the infrastructure and configuration requirements of complex web-services. These code representations have enabled simpler deployment and provisioning, rolling back to previous configurations, mirroring services, planning, scaling, and even debugging problems in the infrastructure provisioning. We believe that many of the features of the “___ as Code” movement and their tools could enhance experiments in general and VR experiments in particular.

Despite this substantial drive for improving the reproducibility of behavioral experiments, Preproducibility is still a challenge. However, the boom of “___ as Code” provides a foundation to implement Preproducibility and achieve reproducibility in behavioral experiments. To the best of our knowledge, the term “Experiment(s) as Code” has only been used in two research articles. First, in EnosStack⁴⁰ that proposes a software stack composed of Python, Ansible, Docker, and their EnosLib library to enable reproducible experimental workflows. Second and similarly,³³ and the statistics supporting it. In this case, the claim is a significant increase in walking performance substantiated by mixed-effects analysis and the p-value of the factor for treatment B, which holds true.

Table 1 Analysis results of the two instantiations of the experiment using a linear mixed effect model.

Full size table

Auditability, Debuggability, & Reusability are shown through a simulated third-party auditing of the experiment and describing bugs found while using the code. This is documented in a GitHub issue ‘Example 1 - Bug fix’, see Fig. S1 in the Supplementary Information. In response to this GitHub issue, the maintainer identifies the cause of the bug and creates a new version reusing most of the code. Additionally, the GitHub issue ‘Example 2 - Reuse’, shows a simulated interaction with a researcher reusing the components of the original experiment for an online version of another desktop VR wayfinding experiment, i.e.⁴², see Fig. S2 in the Supplementary Information. Here, the researcher adds the experiment assets, i.e., the building and building variations, and implements the experiment mechanics, e.g., door frames changing in color. These modifications are limited to the data collection module, while all the other modules are reused without changes. All GitHub issues presented here as examples are linked with their respective code. Interested third parties are encouraged to check the repository Issues section in addition to auditing and forking the repository.

Scalability is evaluated by the ease of recruiting participants in the different instantiations of the experiment and investigating the access to the experiment beyond the 149 participants that successfully took part in the experiment by reviewing the raw accesses to the experiment. This includes all the interactions with the experimental data collection (i.e., MTurkers that clicked on the experiment link, see Fig. 5, E). In total, 462 participants accessed the experiment, 316 (68%) had computers that were deemed capable of displaying the experiment, while the remaining 146 (32%) were not. Of the participants starting the experiment, 149 (47%) participants also completed the experiment. The participants’ inferred Operating System and browsers are reported in Table 2.

Table 2 Excerpt from the participant monitoring.

Full size table

Discussion

Research needs to address the validity of their claims or explain deviations from previous results like behavioral drifts or changes in implicit experimental conditions¹⁵.

Behavioral experiments in VR often require time-consuming setup and diverse expertise in specialized hardware (e.g., physiological sensors^43,44) or infrastructure well beyond the capacity of smaller research groups and researchers^45,46. This difficulty, coupled with the implicit information required to reproduce an experiment, has contributed to the reproducibility crisis across many disciplines. We believe that this crisis can be mitigated through the use of emergent technologies such as the ones behind the “___ as Code” and “___ as a Service” movements. The “___ as Code” technologies provide a tool-set for automation and reproducible software environments. Together with VR frameworks, this enables the implementation of the so far conceptual notion of ‘Preproducibility’. The “___ as a Service” technologies provide outsourcing and abstraction layers that can be used to interface with real world needs such as the recruiting and rewarding of participants.

To explore the applicability of “___ as Code” and “___ as a Service” to behavioral experiments in VR, we first define a taxonomy for behavioral studies in VR (see Fig. 2) to define a scope in which automation could and should occur. We argue that there is a need for such a taxonomy to create a common framework for shareable work and modularity that can enable the reuse of individual components. Second, we proposed a definition of the ExaC paradigm as providing declarative or imperative automation code and digital documentation that corresponds to each of the experiment taxonomy pillars (see Fig. 1). By publishing their code base, researchers offer concrete means of reproducing their research. Hence, enabling their experiments to be reproduced, audited, debugged, reused, and scaled by other researchers. Beyond replication, their research can then be systematically varied to uncover robustness and generalisability. Additionally, different experimental approaches could be adapted to the same experiment, thus enabling triangulation²⁹. In triangulation, the same concept is tested with different approaches to ensure generalisability.

To elucidate the applicability of the ExaC paradigm to support Preproducibility in VR experiments, we present a proof-of-concept implementation of the ExaC and apply it in a VR case-study in HBI. The proof-of-concept implementation of ExaC consists of the digital documentation, the infrastructure & environment definitions.

Our proof-of-concept implementation of ExaC requires a plethora of technologies. This might seem counterproductive at first sight as it could require strong expertise to master. The technologies and software stacks range from Docker, Terraform, Jupyter-Lab, Unity, to WebAssembly, and WebGL. On top of that, they require using several programming languages, Golang, Python, JavaScript, and C#. To add to this complexity, it has to integrate services that serve as an abstract interface to the physical world, such as DeSciL and MTurk. However, they are some of the best technologies available for each specific task and there is currently no one-size-fits-all solution. The additional effort will be amortized by reusing ExaC code bases. Even though monolithic highly coupled software can provide a more homogeneous user experience, we believe it would be difficult for it to keep up with the rapid advancement of the technology. Furthermore, these loosely coupled modular components can be exchanged as technologies progress or as requirements change.

Beyond the fast-paced development of technology, the loose coupling actually reduces end-user complexity. End users are only required to change the components they work on, enabling them to consider all other components black boxes. In the context of VR, this means that using ExaC allows experimenters to focus on experiment content rather than any aspect of deployment and provisioning.

There remain many technical challenges to be tackled for more general ExaC implementations beyond the proof of concept for the case study. For example, interfaces for specialized local or remote hardware, and specific protocols for the composition, description and management of services in the context of experiments need to be developed.

We showcase the reproducibility of our proof-of-concept implementation by analyzing two complete instantiations of the HBI experiment. We show that the pattern of results remains the same, i.e., ‘Treatment A’ (Atria) displays a null effect while ‘Treatment B’ (Glass) displays a statistically significant improvement in comparison to the ‘Control’ unmodified building. We showcase the auditability, debuggability, and reusability of the experiment by performing a task common in open source projects, i.e., a simulated auditor opened a GitHub issue and provided a description of a bug and a fix request, while the repository maintainer identified the required changes in the code and developed that fix. Additionally, we show the implementation of a second experiment by changing the data collection module and reusing the rest of the components. We demonstrate the experiment scaling in two ways, by enabling several sessions of data collection (horizontal scaling) and by recruiting participants from a big remote participant pool in comparison to the usually smaller local participant pools (vertical scaling). Even though our VR experiment was graphically demanding and data-heavy, we reached 462 participants of whom the majority (68%) had computers that were deemed capable of running the desktop VR experiment. From this pool, 149 participants successfully completed our experiment. This showcases the scaling potential of this experiment.

Many experiments pose more challenging reproducibility requirements, e.g., interfacing with specialized local hardware, or behavioral response changes due to nuanced variations in the setup, sensors, or equipment. Researchers can document as ExaC the solutions required to successfully run the experiment, e.g., local server setup and hardware interfaces, sensor configuration, or specialized hardware setup. This would enable themselves and third parties to reuse the solutions and would make the nuances transparent. With an ExaC ecosystem, researchers could focus on the specifics of their experiments while reusing parts of existing and tested experimental setups.

Communicating scientific results requires enumerating, recording, and reporting those things that “cannot with advantage be omitted”¹⁷. Depending on the discipline, Preproducibility¹⁶ might require information about materials (including organisms and their care), instruments and procedures; experimental design; raw data at the instrument level; algorithms used to process the raw data; computational tools used in analyses, including any parameter settings or ad-hoc choices; code, processed data, and software build environments; or even analyses that were tried and abandoned.

To the best of our knowledge, the term “Experiment(s) as Code” has only been used in two research articles. First, in EnosStack⁴⁰ that proposes a software stack composed of Python, Ansible, Docker, and their EnosLib library to enable reproducible experimental workflows. Second and similarly,^{Infrastructure}

The Infrastructure pillar addresses the technological foundation that supports experiments, influencing their scalability and efficiency. It encompasses a spectrum of components such as local or remote data processing equipment, content delivery systems, participant equipment, analysis tools, and virtual machines. There are abstraction concepts that have emerged from cloud computing, e.g., an image is a snapshot of an application and its dependencies. a container is a lightweight, portable, and executable instance created from an image with the aim of isolating and running the application consistently across environments and a virtual machine is a virtualized emulation of a physical computer, running an entire operating system and applications independently, in contrast to containers that share the host OS kernel for efficiency and portability. The inclusion of images and containers ensures the reproducibility of software environments, while the research equipment forms the bridge between experimental design and execution. This pillar recognizes the importance of a versatile and reliable technological ecosystem in the experimentation process.

Even though the infrastructure is crucial to conducting an experiment, it is usually informally provisioned and deployed on a per-case basis. Reports usually only state the operating system, the game engine, and sometimes the hardware specifications for data collection. The issue is even starker because a large part of the infrastructure remains unspecified including the data processing, content delivery system, analysis equipment, and researcher equipment. Only unusual configurations produce more detailed documentation, such as multi-user VR 63,67 but do not provide automation in the form of provisioning or deployment instructions. The rise of IaC offers an opportunity to remove a major hurdle to equitably running VR experiments that to our knowledge so far remain untapped for VR, but see^40,68,69 and simple plug-and-play approaches, e.g.,^62,70,71. Generalization usually provides researchers with a lot of control and at the same time the responsibility to write accurate experimental flows represented as finite-state machines. Plug-and-play approaches usually provide some prefabricated components that can be placed in the experiment and partially configured with configuration files. A middle ground can be found where experts can develop plug-and-play components based on a generalizable foundation that new users can easily use and with more experience start to adapt to their use case, e.g.,^62,69,72.

Another important aspect is the Experiment Integration into background services. Here, it needs to be defined what data is to be collected, how the experiment can be managed while it is running, and how external content can be accessed. Data collection requires a concept of the kind of data collected. We can differentiate between virtual sensor data (collected in VR) and physical sensor data (collected by hardware, e.g., HMDs, physiological sensors)⁶². Most frameworks focus on providing a data collection interface for virtual sensors in the virtual environment, disregarding signals from physical sensors, or only looking at virtualized information such as the position and orientation of HMDs and HIDs.

Especially in immersive VR, tracking the user actions such as inputs can be difficult and requires special attention to ensure an accurate reconstruction⁷³. Beyond measuring participants' performance in VR, it is often useful to measure their attitudes with surveys^62,74. subjective measures, and if possible their physiological reaction⁴³ to underpin the mechanism investigated in a study beyond self-report and task completion. Kee** only pre-analyzed results such as orientation error is argued to simplify research⁷², but this does not account for calculation errors as occurred in some research projects requiring a redo. Kee** all possible raw data is imperative to produce reproducible and auditable experiments.

Monitoring and managing experiment participants in real time requires a definition and API of how the researcher can interact with the experiment as it happens. Only two frameworks offer out of the box solutions for monitoring^63,75. Especially, when studying crowd behavior, it may not be enough to conduct single-user experiments, but it becomes necessary to immerse multiple users at once requiring additional infrastructure and experiment integration^63,67.

Data assembly

The Data Assembly pillar revolves around the acquisition and preparation of data for analysis. The acquisition phase involves gathering experiment data, infrastructure data, sensor data, and associated metadata. Data-serving APIs facilitate efficient data retrieval. In the data pre- or post-processing phase, anonymization, cleansing, and reduction techniques are applied. This pillar underscores the importance of comprehensive data handling from its origin to its refined form, ensuring the reliability and relevance of the datasets.

The data assembly is mostly secondary in nature in previous research. Many frameworks opt for simple text files, e.g.,^63,72,76 to reduce the setup requirements for experiments. Only a few frameworks opt for databases, i.e.,^62,69,73,72 and present this as an advantage over post-processing by exposing the researcher to less tedious tasks. However, in terms of auditability, having insight into the raw data is an advantage and if data analysis is integrated into the framework, then post-processing can also be largely automated and formally documented.

Effective data processing is crucial to scientific endeavors, but usually, even basic checks are not reported. Ensuring data quality is often not performed or only informally addressed by using robust statistical methods. Data is only visualized based on summary metrics such as walked distance and time spent in spatial cognition experiments. This contrasts with the richness of data that a VR experiment allows to be collected. Providing visualization tools, diagnostic tools for statistical properties, and other things would ensure that the data collected can be used for statistical analysis.

Obtaining insights from data is often performed independent of the experiment and the VR framework. The advantage of kee** data insights within the framework is to tightly connect the analysis with the data collection mechanism and to greatly enhance the reproducibility and reusability. It also allows simulations and machine learning analysis to rely on any collected data.

Management

The Management pillar takes a holistic view, focusing on overseeing the entire experimental process. Participant acquisition involves sampling, recruiting, payment, and ensuring participants are immersed in the experiment. Monitoring mechanisms track the progress and status of both participants and infrastructure in real-time, sometimes providing live views into ongoing experiments. This pillar stresses the importance of effective oversight to guarantee the integrity, ethical conduct, and success of experiments, from inception to completion.

The management of an experiment is usually not codified in published research and research tools but implicitly performed but see^63,75. Some parts of the management of the experiments may be reported, such as how participants were selected, but other duties are usually left out. It is important to give special care to management as this neglected component of an experiment is often decisive for the quality of the outcome and the ability to reproduce the work.

VR Frameworks in the context of ExaC

The pillars used in ExaC provide defined boundaries for modularisation. VR frameworks and specialized software can be used to provide automation code and digital documentation for them. Many existing VR Frameworks already provide support or potential for the automation of many of the common components described within the pillars. We provide a summary of the features of these VR Frameworks through the lens of ExaC and the six pillars in Table 3. Additionally, we provide an in-depth discussion of the VR frameworks through the lens of ExaC in the Supplementary Information Section S1 and a VR-Check list in the Supplementary Information Section S2.

Table 3 Comparison of current VR frameworks since 2015.

Full size table

In the early 2000s, a first set of VR frameworks^{82,83,84,85,86} focused on simplifying the technical setup. However, these frameworks were rapidly eclipsed by game engines (i.e., Unity and Unreal) and became quickly obsolete. Over the last five years, a second generation of VR frameworks^{62,63,68,69,71,72,75,76,78,79,80} has put the experiment design at the core of the framework. Technical issues such as rendering, physics, and specialized hardware have been delegated to the underlying game engines such as Unity or Unreal and third-party libraries to integrate Human Interface Devices (HIDs; e.g. Oculus Touch, 3D mice).

The second generation of VR frameworks provides better support to the different components required for experimentation. However, there is still a considerable gap in terms of hardware provisioning and deployment, support for data analysis, multi-user setups, and participant management and monitoring. To bridge these gaps, we envision a third generation of VR frameworks that either natively support all six pillars of ExaC or that are modularly designed to better integrate with specialized tools that cover them. This third generation would extend their focus from experimental design towards reproducibility, auditability, debuggability, reusability, and scalability. General frameworks for behavioral experimentation recently started focusing on reproducibility^{40,Supplementary Information Section S2). Apart from the traditional software requirement documentation (e.g., package.json, requirements.txt, and Pipfile) and the traditional experiment documentation (e.g., the protocol) we include a service documentation file services.json, denoting the requirements in terms of external services (i.e., abstraction layer services, and infrastructure services) required for the experiment (e.g., MTurk, AWS).}

Infrastructure—A2

SimpleExaCT defines the required run-time environments with Docker (i.e. EaC; see Fig. 5, A2) and automates the provisioning and deployment of the required experimental infrastructure through Terraform scripts (i.e. IaC; see Fig. 5, A2). The script creates an S3 bucket and uploads the static content, reserves the computing infrastructure for the data assembly service, and deploys it. All this is initiated directly from the researchers’ computer. Once the data collection has finished, the Terraform scripts automate the stop and release of all cloud resources used in the experiment.

Data collection—A3

The participants join the experiment with their web browser. In practice, a link is provided and the experimental data collection client is loaded as any other web-page. As participants visit this link, the experiment goes through 3 phases: the Onboarding of participants, the participants’ engagement in the Experiment trial and finally their Offboarding (see Fig. 5, E). SimpleVR contains scripts and templates that instrument this data collection procedure (see Fig. 5, A3).

In the onboarding, we perform two checks to ensure that both Unity can be started and that it runs at a reasonable frame rate. We use Unity’s support of web-assembly with WebGL as “build” target to offload the VR computations to the web-browser. To determine if the web-assembly module can be loaded, the browser compatibility and rendering capabilities are detected through Unity. The prospective quality of rendering is tested using the test suite of mapboxgl (a JavaScript library). If the requirements are fulfilled, the experiment content is loaded. The Unity-based data collection client receives the parameters to instantiate the experiments’ content along with authentication for participants’ payment.

In the experiment, the specific treatment tested in this trial is instantiated for data collection, and the content is rendered in the participants’ browser (see Fig. 5, E). The whole experimental trial is distributed as static content. For example, in the presented case study through an AWS S3 Bucket. We used Unity to render, control, record, and transfer movement data from the virtual environment on the participants’ computers. The collected experimental data is sent back to the data assembly service continuously throughout the trial. The client can send different types of messages to the data assembly service to assemble, store, and make it available to the researcher. For the simplicity of the demonstration, we implemented two main types, which are event messages and trajectory messages. Event messages provide metadata about a specific event (e.g., agreeing on the consent form, or completing a specific task). Trajectories are recorded by logging participants’ positions and orientations with varying time resolutions. Trajectories represent heavier data transfers and thus are segmented into packages. Data is transferred in the background while the participants complete the tasks (i.e., in 4.3Kb chunks) to the data collection service and stored in a secure cloud storage.

In the offboarding, once the participants have completed the trial, a response code is calculated by the data collection client to signal the conclusion of the experiment. The code is based on an implementation of the Salted Challenge Response Authentication Mechanism (SCRAM)⁹⁰. Here, a challenge code is generated and combined with a researcher-provided authentication called “salt”. This delivers the signal to external services like MTurk that participants have completed their tasks and that these services can transfer the payment.

Data assembly—A4

The Data Assembly service provides the backend for data collection and securely stores and makes the data available to the researcher. This service (see Fig. 5, A4) has been built as a lightweight go program due to the language’s embedded concurrency features. To deploy the Data Assembly service, we containerize it using Docker. Heavy traffic experiments may require the scaling of the data collection service and running it as part of a Kubernetes cluster. This enables having detailed control over the load balancer and reverse proxies for load sharing.

For the data transmission, the service receives the POST messages of the clients and stores them in the service’s local persistent storage. The transmitted messages provide header and tail identifier messages. The tails provide a checksum over the transferred data to ensure the correct reconstruction of the transferred data once all packages/chunks have been received. Once the tail packet is received, the data for a specific trial is reconstructed and uploaded to a secure cloud storage (in the case study we use an AWS S3 bucket). The data is transferred as JSONs for its reconstruction. The reconstructed data is assembled into CSV files and uploaded into a cloud storage solution (e.g., AWS S3 bucket).

Data transmission between the data collection service and the data assembly service is secured to prevent mishandling. It takes advantage of HTTPS (HTTP over TLS) enabling the encrypted transport of the data. It has to be noted that modern web browsers will refuse to establish a connection over HTTPS with self-made SSL certificates. If no SSL certificates are available, researchers will be forced to revert back to insecure transmission and protect the data transfers themselves, i.e. encrypt the data instead of the channel.

Data analysis—A5

To exemplify the data analysis, a Python Jupyter notebook with a minimal working example for our case study’s analysis is provided. During the deployment of SimpleExaCT, a container with Jupyter-Lab on the local researcher's computer is run to provide an interface to the bundled notebooks (see Fig. 5, A5 & B ). This component is well-qualified to serve as the basis for pre-registration for an analysis plan.

Management—A6

The experiment is scaffolded with cloud infrastructure. It can be served to anyone with a capable web browser through a link to the static content and the corresponding Experiment (see Fig. 5, E). To address the difficulty of recruiting and securely rewarding the participants, we deployed the experiment through Amazon Mechanical Turk (MTurk).

To monitor and manage both the hardware and participants, we include two further Python Jupyter notebooks, (see Fig. 5, A6). It has to be noted that monitoring and management should not be interpreted as surveillance. It is a privacy-preserving, consensual, and non-intrusive check of the state of the experiment and the health of the hardware. We reuse the same dockerized local Jupyter-Lab instance mentioned in the data analysis to provide access to these notebooks (see Fig. 5, B). The participant monitoring and management notebooks interact with Amazon Mechanical Turk’s API and serve to link our experiment with specific challenge codes and the correct treatment assignment, once the experiment is completed. Additionally, it verifies the correctness of the responses to the challenges and rewards the participants. Through the hardware management notebook, we monitor the state of the cloud infrastructure. The current proof-of-concept implementation triggers an alarm sound when the Data Assembly service is unreachable.

Demonstrative case study design

We selected a desktop VR experiment in HBI that does not require specialized equipment beyond a capable computer and a browser and is robust under a wide range of uncontrolled variations, e.g., variations in screen sizes, or distance to the screen. This enabled us to easily showcase the use of remote data collection and demonstrate the scaling of participants. This experiment was carried out in April 2020 in three sessions, during the COVID-19 pandemic and lockdowns,³³.

The data collection was spread out across days to obtain enough participants according to the power analysis. The three sessions were instantiated from scratch and can be understood as three independent experiments. The third session did not collect data on all treatments because only some treatments required more data to fulfill a preliminary power analysis. Hence, the results presented in this paper include only a comparison between the two full instantiations, sessions 1 and 2. For demonstration purposes and to avoid conflict with the detailed publications of the experiment, here we refer in general terms to: (1) the control and applied treatments which were architectural variations on a building, and (2) the dependent variable used to measure the participant’s performance. Details on the experiment and further behavioral analysis of the MTurk participant data can be found here³³. Details on the validation of an Agent-Based Model (ABM) using this data can be found here⁵⁸.

Participants were recruited online through MTurk Human Intelligence Tasks (HITs). Participants were filtered according to the external requirements of informed consent, technical ability to display the experiment, and completion of the experiment. We only used data from participants whose web browser was deemed capable of displaying the experiment, who agreed on the consent screen, and who completed the task with the correct response to the challenge. 149 participants fulfilled these criteria. Participants’ ages ranged between 18 years and 59 years. The mean age was 33.7 years with a Standard Deviation of 6.8 years. The methods and experiment described in this section were performed in accordance with the relevant guidelines and regulations following the approval of the study by ETH Zurich’s Research Ethics Committee (2020-N-24). Two inclusion criteria for the study were English proficiency, and corrected-to-normal or normal vision. Color-blind individuals were excluded from participating in the study. All participants signed an informed consent form. The average time taken to complete the experiment was 20 minutes and the average monetary compensation was 4.5 USD. If participants completed the experiment in less than 20 minutes, they received a 1 USD bonus. The average compensation was 5.4 USD.

Experimental protocol: The protocol of the experiment can be found in the SimpleExaCT repository,⁸⁷ and a complete description of the demonstrative experiment content, the experimental protocol, materials, and data analysis can be found at⁵⁸. Overall, the experiment tested how changes in the configuration of buildings may affect human navigation performance.

Figure 5 shows how ExaC was used to deploy the experiment. An abstraction layer (Fig. 5, C) was in interfacing with the participant procurement service. For logistical reasons, we opted for the DeSciL and MTurk as specific service providers in charge of communication with both the researcher and with the study participants. Nonetheless, we provide scripts to interface with MTurk directly within the SimpleExaCT implementation. Communication with participants mainly consisted of deploying the HITs for them to perform the experiments.

Participants had a first person perspective of the virtual scene in each building. Participants’ origin was set to the same location across the three building groups (see Fig. 5, E). Navigation was performed using a keyboard and mouse. Participants had to digitally sign a consent form and complete the onboarding before the experimental trials and the offboarding afterward. The onboarding ensured the correct assignment to one of the three building conditions and the offboarding the appropriate reward. The experiment itself mainly consisted of two parts, a training phase and 6 navigation trials. The mean duration of the training phase was 5 min. During training, participants learned how to move in the virtual environment using a mouse and keyboard. This is a common procedure to ensure that participants are familiar with the controls to reduce adverse effects^91,92. Upon successful completion of the practice scene, participants had to complete 6 trials in which they were asked to find a specific destination inside the building.

The participants’ positions and orientations were recorded every frame (roughly every 0.02 s) using a cloud infrastructure to deploy and retrieve participants’ data (see Fig. 5, C and D). The research team (see Fig. 5, B) was able to manage and monitor participants, hardware and software status from their computers. The abstraction services provided interactions with the management of participants (see Fig. 5, B and C). The cloud infrastructure provided interactions with both the deployment of the experiment content and access to the collected data (see Fig.5, B and D). Using the analysis scripts, the research team was able to visualize and analyze the data collected immediately after the experiment was concluded (see Fig. 5, B).

Informed consent

The research with human participants was approved by the Research Ethics Committee of ETH Zürich (2020-N-24). The participants were informed on the study goal and gave informed consent and accepted the publication of appropriately anonymised data.

Data availability

The workflow required to reproduce the experiment is quite simple. The instructions are provided in the README of the github repository⁸⁷. The checklist is summarised here: 1. Fill in or customise the variables in the Terraform script to access cloud services (e.g., AWS, MTurk etc). 2. Provision and deploy experiment infrastructure: # terraform init, and # terraform apply. 3. Release experiment trials (HITs) for participants to complete through the provided MTurk’s scripts. 4. Manage and Monitor the experiment through the management scripts. 5. Download your data and analysis results.

References

Becerik-Gerber, B. et al. Ten questions concerning human-building interaction research for improving the quality of life. Build. Environ. 226, 109681 (2022).
Article Google Scholar
Li, P., Froese, T. M. & Brager, G. Post-occupancy evaluation: State-of-the-art analysis and state-of-the-practice review. Build. Environ. 133, 187–202 (2018).
Article Google Scholar
Kuliga, S. F., Thrash, T., Dalton, R. C. & Hölscher, C. Virtual reality as an empirical research tool-exploring user experience in a real building and a corresponding virtual model. Comput. Environ. Urban Syst. 54, 363–375 (2015).
Article Google Scholar
Moussaïd, M. et al. Crowd behaviour during high-stress evacuations in an immersive virtual environment. J. R. Soc. Interface 13, 20160414 (2016).
Article PubMed PubMed Central Google Scholar
Dubey, R. K., Khoo, W. P., Morad, M. G., Hölscher, C. & Kapadia, M. Autosign: A multi-criteria optimization approach to computer aided design of signage layouts in complex buildings. Comput. Graph. 88, 13–23 (2020).
Article Google Scholar
Gath-Morad, M. et al. Visibility matters during wayfinding in the vertical. Sci. Rep. 11, 18980 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Gath-Morad, M., Melgar, L. E. A., Conroy-Dalton, R. & Hölscher, C. Beyond the shortest-path: Towards cognitive occupancy modeling in bim. Autom. Constr. 135, 104131 (2022).
Article Google Scholar
Grübel, J. Computer-Aided Experimentation for Human Behaviour Analysis. Ph.D. thesis, ETH Zurich (2022).
Faulkner, L. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behav. Res. Methods Instrum. Comput. 35, 379–383 (2003).
Article PubMed Google Scholar
Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Article PubMed PubMed Central Google Scholar
Pashler, H. & Wagenmakers, E.-J. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence?. Perspect. Psychol. Sci. 7, 528–530 (2012).
Article PubMed Google Scholar
Collaboration, O. S. et al. Estimating the reproducibility of psychological science. Science 349, 56 (2015).
Google Scholar
Camerer, C. F. et al. Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018).
Article PubMed Google Scholar
Hassani, R. & Idrissi, Y. E. B. E. Communication and software project management in the era of digital transformation. In Proceedings of the International Conference on Geoinformatics and Data Analysis 22–26 (2018).
Reardon, S. A mouse’s house may ruin experiments. Nat. News 530, 264 (2016).
Article CAS Google Scholar
Stark, P. B. Before reproducibility must come preproducibility. Nature 557, 613–614 (2018).
Article ADS CAS PubMed Google Scholar
Popper, K. R. The Open Universe: An Argument for Indeterminism Vol. 2 (Psychology Press, 1992).
Google Scholar
Stojnic, R., Ross, T., Kardas, M., Kerkez, V. & Viaud, L. Papers with Code indexing service. https://www.paperswithcode.com/sota (2021).
Swiss Data Science Center. Renku. https://renkulab.io (2021).
Aguilar, L. et al. Ease.ml: A lifecycle management system for machine learning. In 11th Annual Conference on Innovative Data Systems Research (CIDR 2021)(virtual) (CIDR, 2021).
Orchard, S., Hermjakob, H. & Apweiler, R. The proteomics standards initiative. PROTEOMICS: Int. Ed. 3, 1374–1376 (2003).
Article CAS Google Scholar
Welter, D. et al. The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Article CAS PubMed Google Scholar
Baumgaertner, B., Devezer, B., Buzbas, E. O. & Nardin, L. G. Openness and reproducibility: Insights from a model-centric approach. ar**v:1811.04525 (2018).
Van’t-Veer, A. E. & Giner-Sorolla, R. Pre-registration in social psychology-a discussion and suggested template. J. Exp. Soc. Psychol. 67, 2–12 (2016).
Article Google Scholar
Wallot, S. & Kelty-Stephen, D. G. Interaction-dominant causation in mind and brain, and its implication for questions of generalization and replication. Mind. Mach. 28, 353–374 (2018).
Article Google Scholar
Grahe, J. E. et al. Harnessing the undiscovered resource of student research projects. Perspect. Psychol. Sci. 7, 605–607 (2012).
Article PubMed Google Scholar
Colquhoun, D. An investigation of the false discovery rate and the misinterpretation of p-values. R. Soc. Open Sci. 1, 140216 (2014).
Article ADS PubMed PubMed Central Google Scholar
Maxwell, S. E., Lau, M. Y. & Howard, G. S. Is psychology suffering from a replication crisis? what does “failure to replicate’’ really mean?. Am. Psychol. 70, 487 (2015).
Article PubMed Google Scholar
Munafò, M. R. & Smith, G. D. Robust research needs many lines of evidence (2018).
Gosselin, R.-D. Statistical analysis must improve to address the reproducibility crisis: The access to transparent statistics (acts) call to action. BioEssays 42, 1900189 (2020).
Article Google Scholar
White, A. L., Stoughton, M. & Feng, L. Servicizing: The quiet transition to extended product responsibility. Tellus Inst. Boston 97, 85 (1999).
Google Scholar
Agrawal, V. V. & Bellos, I. The potential of servicizing as a green business model. Manage. Sci. 63, 1545–1562 (2016).
Article Google Scholar
Gath-Morad, M. et al. The role of strategic visibility in sha** wayfinding behavior in multilevel buildings. Sci. Rep. 2020, 89 (2024).
Google Scholar
Ioannidis, J. P., Fanelli, D., Dunne, D. D. & Goodman, S. N. Meta-research: Evaluation and improvement of research methods and practices. PLoS Biol. 13, e1002264 (2015).
Article PubMed PubMed Central Google Scholar
Weiss, I. R. Auditability of software: A survey of techniques and costs. MIS Q. 1980, 39–50 (1980).
Article Google Scholar
Rabinovich, M. The reproducibility versus debuggability of research. IEEE Internet Comput. 18, 4–6 (2014).
Article Google Scholar
Prieto-Diaz, R. Status report: Software reusability. IEEE Softw. 10, 61–66 (1993).
Article Google Scholar
Echtler, F. & Häußler, M. Open source, open science, and the replication crisis in hci. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems 1–8 (2018).
Bondi, A. B. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance 195–203 (2000).
Cherrueau, R.-A., Simonin, M. & Van Kempen, A. Enosstack: A lamp-like stack for the experimenter. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 336–341 (IEEE, 2018).
Almaatouq, A. et al. Empirica: A virtual lab for high-throughput macro-level experiments. ar**v:2006.11398 (2020).
Gath-Morad, M. et al. How visibility matters for wayfinding in the vertical (2020).
Weibel, R. P. et al. Virtual reality experiments with physiological measures. JoVE J. Vis. Exp. 1, e58318 (2018).
Google Scholar
Hackman, D. A. et al. Neighborhood environments influence emotion and physiological reactivity. Sci. Rep. 9, 1–11 (2019).
Article ADS Google Scholar
Fischer, J., Wissen Hayek, U., Galleguillos Torres, M., Weibel, B. & Grêt-Regamey, A. Investigating effects of animated 3d point cloud simulations on emotional responses. J. Digital Landsc. Architect. 5, 295–304 (2020).
Google Scholar
Spielhofer, R. et al. Physiological and behavioral reactions to renewable energy systems in various landscape types. Renew. Sustain. Energy Rev. 135, 110410 (2020).
Article Google Scholar
Helbing, D. The automation of society is next: How to survive the digital revolution. Available at SSRN 2694312 (2015).
Grieves, M. & Vickers, J. Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives Complex Systems 85–113 (Springer, 2017).
Grübel, J. et al. Fused twins: A cognitive approach to augmented reality media architecture. In Media Architecture Biennale 20 215–220 (Association for Computing Machinery, Inc., 2021).
Grübel, J. The hitchhiker’s guide to fused twins–a conceptualization to access digital twins in situ in smart cities. ar**v:2202.07104 (2022).
Helbing, D. Agent-based modeling. In Social self-organization 25–70 (Springer, 2012).
Grübel, J., Wise, S., Thrash, T. & Hölscher, C. A cognitive model for routing in agent-based modelling. In AIP Conference Proceedings, vol. 2116 (AIP Publishing LLC, 2019).
Schamann D. et al. A computational framework to simulate human spatial behavior in built environments. In SimAUD 2016 Symposium on Simulation for Architecture and Urban Design (2016).
Aguilar, L., Lalith, M., Hori, M., Ichimura, T. & Tanaka, S. A scalable workbench for large urban area simulations, comprised of resources for behavioural models, interactions and dynamic environments. In International Conference on Principles and Practice of Multi-Agent Systems 166–181 (Springer, 2014).
Aguilar, L., Wijerathne, L., Ichimura, T., Hori, M. & Tanaka, S. Mixed mode large urban area tsunami evacuation considering car-pedestrian interactions. Proc. Jpn. Soc. Civ. Eng. A2 (Appl. Mech.) 71, I633–I641 (2015).
Google Scholar
Aguilar, L., Lalith, M., Ichimura, T. & Hori, M. Automatic evacuation management using a multi agent system and parallel meta-heuristic search. In International Conference on Principles and Practice of Multi-Agent Systems 387–396 (Springer, 2016).
Aguilar, L., Wijerathne, L., Jacob, S., Hori, M. & Ichimura, T. Mass evacuation simulation considering detailed models: Behavioral profiles, environmental effects, and mixed-mode evacuation. Asia Pac. Manag. Rev. 24, 114–123 (2019).
Google Scholar
Gath-Morad, M., Aguilar Melgar, L. E., Conroy-Dalton, R. & Holscher, C. Towards a cognitive agent simulation in architecture (2020).
Zhao, H. et al. Assessing crowd management strategies for the 2010 love parade disaster using computer simulations and virtual reality. J. R. Soc. Interface 17, 20200116 (2020).
Article PubMed PubMed Central Google Scholar
Zhao, H. et al. Fire evacuation supported by centralized and decentralized visual guidance systems. Saf. Sci. 145, 105451 (2022).
Article Google Scholar
Brander, M. et al. Virtual reality human-human interface to deliver psychotherapy to people experiencing auditory verbal hallucinations, a development and usability study. 29/12/2020:26820 (2021).
Grübel, J. et al. Eve: A framework for experiments in virtual environments. In Spatial Cognition X 159–176 (Springer, 2016).
Zhao, H. et al. A networked desktop virtual reality setup for decision science and navigation experiments with multiple participants. JoVE (J. Visual. Exp.) 1, e58155 (2018).
Google Scholar
Noy, N. F. & Hafner, C. D. Representing scientific experiments: Implications for ontology design and knowledge sharing. In AAAI/IAAI 615–622 (1998).
Soldatova, L. N. & King, R. D. An ontology of scientific experiments. J. R. Soc. Interface 3, 795–803 (2006).
Article PubMed PubMed Central Google Scholar
Soldatova, L. N. & King, R. D. Are the current ontologies in biology good ontologies?. Nat. Biotechnol. 23, 1095–1098 (2005).
Article CAS PubMed Google Scholar
Moussaïd, M., Schinazi, V. R., Kapadia, M. & Thrash, T. Virtual sensing and virtual reality: How new technologies can boost research on crowd dynamics. Front. Robot. AI 5, 82 (2018).
Article PubMed PubMed Central Google Scholar
Moulec, G. L., Argelaguet, F., Gouranton, V., Blouin, A. & Arnaldi, B. Agent: Automatic generation of experimental protocol runtime. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology 1–10 (2017).
Brookes, J., Warburton, M., Alghadier, M., Mon-Williams, M. & Mushtaq, F. Studying human behavior with virtual reality: The unity experiment framework. Behav. Res. Methods 2019, 1–9 (2019).
Google Scholar
Vasser, M. et al. Vrex: An open-source toolbox for creating 3d virtual reality experiments. BMC Psychol. 5, 1–8 (2017).
Article Google Scholar
Ugwitz, P., Šašinková, A., Šašinka, Č, Stachoň, Z. & Juřík, V. Toggle toolkit: A tool for conducting experiments in unity virtual environments. Behav. Res. Methods 2021, 1–11 (2021).
Google Scholar
Starrett, M. J. et al. Landmarks: A solution for spatial navigation and memory experiments in virtual reality. Behav. Res. Methods 2020, 1–14 (2020).
Google Scholar
Howie, S. & Gilardi, M. Virtual observations: A software tool for contextual observation and assessment of user’s actions in virtual reality. Virtual Real. 2020, 1–14 (2020).
Google Scholar
Tcha-Tokey, K., Loup-Escande, E., Christmann, O. & Richir, S. A questionnaire to measure the user experience in immersive virtual environments. In Proceedings of the 2016 virtual reality international conference 1–5 (2016).
Bebko, A. O. & Troje, N. F. bmltux: Design and control of experiments in virtual reality and beyond. i-Perception 11, 2041669520938400 (2020).
Article PubMed PubMed Central Google Scholar
Watson, M. R., Voloh, B., Thomas, C., Hasan, A. & Womelsdorf, T. Use: An integrative suite for temporally-precise psychophysical experiments in virtual environments for human, nonhuman, and artificially intelligent agents. J. Neurosci. Methods 326, 108374 (2019).
Article PubMed Google Scholar
Tiwari, K., Kyrki, V., Cheung, A. & Yamamoto, N. Define: Delayed feedback based immersive navigation environment for studying goal-directed human navigation (2021). ar**v:2003.03133.
Wang, Y., Ijaz, K., Yuan, D. & Calvo, R. A. Vr-rides: An object-oriented application framework for immersive virtual reality exergames. .Practice and Exp. 50, 1305–1324 (2020).
Google Scholar
Schneider, S., Kuliga, S., Weiser, R., Kammler, O. & Fuchkina, E. Vreval-a bim-based framework for user-centered evaluation of complex buildings in virtual environments. In Proceedings of the 36th eCAADe Conference (CUMINCAD, 2018).
Alsbury-Nealy, K. et al. Openmaze: An open-source toolbox for creating virtual environment experiments (2020).
Regal, G., Schatz, R., Schrammel, J. & Suette, S. Vrate: A unity3d asset for integrating subjective assessment questionnaires in virtual environments. In 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX) 1–3 (IEEE, 2018).
Bierbaum, A. et al. Vr juggler: A virtual platform for virtual reality application development. In Proceedings IEEE Virtual Reality 2001 89–96 (IEEE, 2001).
Tramberend, H. Avocado: A distributed virtual reality framework. In Proceedings IEEE Virtual Reality (Cat. No. 99CB36316) 14–21 (IEEE, 1999).
Ayaz, H., Allen, S. L., Platek, S. M. & Onaral, B. Maze suite 1.0: A complete set of tools to prepare, present, and analyze navigational and spatial cognitive neuroscience experiments. Behav. Res. Methods 40, 353–359 (2008).
Article PubMed Google Scholar
Annett, M. & Bischof, W. F. Vr for everybody: the snap framework. In Proceedings of IEEE VR 2009 Workshop on Software Engineering and Architectures for Realtime Interactive Systems 131–132 (2009).
Mossel, A., Schönauer, C., Gerstweiler, G. & Kaufmann, H. Artifice-augmented reality framework for distributed collaboration. Int. J. Virtual Real. 11, 1–7 (2012).
Article Google Scholar
Aguilar Melgar, L. & GathMorad, M. SimpleExaCT (2021).
Rivu, R. et al. Remote vr studies–a framework for running virtual reality studies remotely via participant-owned hmds. ar**v:2102.11207 (2021).
Krohn, S. et al. Multidimensional evaluation of virtual reality paradigms in clinical neuropsychology: Application of the vr-check framework. J. Med. Internet Res. 22, e16724 (2020).
Article PubMed PubMed Central Google Scholar
Newman, C., Menon-Sen, A., Melnikov, A. & Williams, N. Salted challenge response authentication mechanism (scram) sasl and gss-api mechanisms (2010).
Thrash, T. et al. Evaluation of control interfaces for desktop virtual environments. Presence Teleop. Virt. 24, 322–334 (2015).
Article Google Scholar
Grübel, J., Thrash, T., Hölscher, C. & Schinazi, V. R. Evaluation of a conceptual framework for predicting navigation performance in virtual reality. PLoS ONE 12, e0184682 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research has been funded by the Chair of Cognitive Science and the Data Science, Services and Systems Laboratory at ETH Zürich. M.G.M was funded by a Swiss National Science Foundation (SNSF) post.doc mobility fellowship (P500PT_206932). J.G. was supported by an ETH Zurich Doc.Mobility Fellowship. D.H. would like to thank for partial support through the project “CoCi: Co-Evolving City Life”, which has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 833168.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

These authors contributed equally: Leonel Aguilar, Michal Gath-Morad and Jascha Grübel.

Authors and Affiliations

Chair of Cognitive Science, ETH Zürich, Zurich, Switzerland
Leonel Aguilar, Michal Gath-Morad, Jascha Grübel, Jasper Ermatinger & Christoph Hölscher
Data Science, Systems and Services Group, ETH Zürich, Zurich, Switzerland
Leonel Aguilar & Ce Zhang
Cambridge Cognitive Architecture, University of Cambridge, Cambridge, UK
Michal Gath-Morad
Geo-information Science and Remote Sensing Laboratory, Wageningen University, Wageningen, The Netherlands
Jascha Grübel & Robert W. Sumner
Game Technology Center, ETH Zürich, Zurich, Switzerland
Jascha Grübel
Visual Computing Group, Harvard University, Cambridge, USA
Jascha Grübel
Center for Sustainable Future Mobility, ETH Zürich, Zurich, Switzerland
Jascha Grübel
Geoinformation Engineering Group, ETH Zürich, Zurich, Switzerland
Jascha Grübel
Decision Science Laboratory, ETH Zürich, Zurich, Switzerland
Stefan Wehrli, Dirk Helbing & Christoph Hölscher
Chair of Computational Social Science, ETH Zr̈ich, Zurich, Switzerland
Dirk Helbing
School of Cyber Science and Engineering, Southeast University, Nan**g, China
Hantao Zhao
Purple Mountain Laboratories, Nan**g, China
Hantao Zhao

Authors

Leonel Aguilar
View author publications
You can also search for this author in PubMed Google Scholar
Michal Gath-Morad
View author publications
You can also search for this author in PubMed Google Scholar
Jascha Grübel
View author publications
You can also search for this author in PubMed Google Scholar
Jasper Ermatinger
View author publications
You can also search for this author in PubMed Google Scholar
Hantao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wehrli
View author publications
You can also search for this author in PubMed Google Scholar
Robert W. Sumner
View author publications
You can also search for this author in PubMed Google Scholar
Ce Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Helbing
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Hölscher
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualisation: L. Aguilar, M. Gath-Morad, J. Grübel, D. Helbing, C. Hölscher, R. W. Sumner, C. Zhang. Data curation: L. Aguilar. Formal Analysis: L. Aguilar. Funding acquisition: C. Hölscher, C. Zhang. Investigation: L. Aguilar, M. Gath-Morad, S. Wehrli. Methodology: L. Aguilar, M. Gath-Morad, J. Grübel. Project administration: C. Hölscher. Resources: C. Hölscher, S. Wehrli, C. Zhang. Software: L. Aguilar, J. Ermatinger, S. Wehrli, H. Zhao. Supervision: C. Hölscher, R. W. Sumner, C. Zhang. Validation: L. Aguilar. Visualisation: L. Aguilar, M. Gath-Morad, J. Grübel. Writing - original draft: L. Aguilar, M. Gath-Morad, J. Grübel. Writing - review & editing: L. Aguilar, M. Gath-Morad, J. Grübel, D. Helbing, C. Hölscher, R. W. Sumner, S. Wehrli, H. Zhao, C. Zhang.

Corresponding author

Correspondence to Leonel Aguilar.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Aguilar, L., Gath-Morad, M., Grübel, J. et al. Experiments as Code and its application to VR studies in human-building interaction. Sci Rep 14, 9883 (2024). https://doi.org/10.1038/s41598-024-60791-3

Download citation

Received: 01 June 2023
Accepted: 26 April 2024
Published: 30 April 2024
DOI: https://doi.org/10.1038/s41598-024-60791-3
Springer Nature Limited

Associated content

Human-building interactions

Collection 25 March 2022

Experiments as Code and its application to VR studies in human-building interaction

Abstract

Similar content being viewed by others

Building the MSR Tool Kaiaulu: Design Principles and Experiences

Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic

Towards the Next Generation of Scenario Walkthrough Tools – A Research Preview

Introduction

Concept

Discussion

Data assembly

Management

VR Frameworks in the context of ExaC

Infrastructure—A2

Data collection—A3

Data assembly—A4

Data analysis—A5

Management—A6

Demonstrative case study design

Informed consent

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation