Search
Search Results
-
Resource allocation and aging priority-based scheduling of linear workflow applications with transient failures and selective imprecise computations
A wide range of applications in distributed environments have a linear structure, varying priorities, and may experience transient software failures....
-
A Model of Actors and Grey Failures
Existing models for the analysis of concurrent processes tend to focus on fail-stop failures, where processes are either working or permanently... -
Characterizing Memory Failures Using Benford’s Law
Fault tolerance is a key challenge as high performance computing systems continue to increase component counts, individual component reliability... -
Interpreting the vulnerability of power systems in cascading failures using multi-graph convolutional networks
Analyzing the vulnerability of power systems in cascading failures is generally regarded as a challenging problem. Although existing studies can...
-
Disconnected Agreement in Networks Prone to Link Failures
We consider deterministic distributed algorithms for reaching agreement in synchronous networks of arbitrary topologies. Links are bi-directional and... -
Exploring the Impact of Node Failures on the Resource Allocation for Parallel Jobs
Increasing the size and complexity of modern HPC systems also increases the probability of various types of failures. Failures may disrupt... -
Bayesian network model to distinguish between intentional attacks and accidental technical failures: a case study of floodgates
Water management infrastructures such as floodgates are critical and increasingly operated by Industrial Control Systems (ICS). These systems are...
-
Modeling and adaptive control for a spatial flexible spacecraft with unknown actuator failures
In this paper, we address simultaneous control of a flexible spacecraft’s attitude and vibrations in a three-dimensional space under input...
-
Power System Transient Stability Prediction in the Face of Cyber Attacks: Employing LSTM-AE to Combat Falsified PMU Data
Phasor measurement units (PMUs) are essential instruments in delivering real-time data crucial for monitoring the dynamics of power systems. They are... -
The Pathology of Failures in IoT Systems
The presence of faults is inevitable in the Internet of Things (IoT) systems. Dependability in these systems is challenging due to the increasing... -
Integrating request replication into FaaS platforms: an experimental evaluation
Function-as-a-Service (FaaS) is a popular programming model for building serverless applications, supported by all major cloud providers and many...
-
Consensus in anonymous asynchronous systems with crash-recovery and omission failures
In anonymous distributed systems, processes are indistinguishable because they have no identity and execute the same algorithm. Currently, anonymous...
-
Asynchronous Consensus in Synchronous Systems Using send_to_all Primitive
Consensus is a fundamental agreement problem that arises when a set of distributed processes has to decide on a common value among their respective...
-
Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic Review
With the trend of accelerating digital transformation processes, datacenters (DC) are gaining prominence as increasingly critical components for... -
Transient Analysis of Hierarchical Semi-Markov Process Models with Tool Support in Stateflow
Semi-Markov process (SMP) models can not always accurately model real-world systems. To help the situation the paper proposes an hierarchical... -
Modelling of Software Failures
Software is crucial in the provision of communication services. Most functions related to control, management and operation are realized in software.... -
\(\mu \) Chaos: Moving Chaos Engineering to IoT Devices
The concept of the Internet of Things (IoT) has been widely used in many applications. IoT devices can be exposed to various external factors, such... -
Simulation Experiments of a Distributed Fault Containment Algorithm Using Randomized Scheduler
Fault containment is a critical component of stabilizing distributed systems. A distributed system is termed stabilizing (or self-stabilizing) if it... -
A fault-tolerant scheduling algorithm that minimizes the number of replicas in heterogeneous service-oriented cloud computing systems
The service-oriented heterogeneous cloud computing system offers paid computing services through its powerful processors. However, task execution...
-
Failure and fault classification for smart grids
Smart grid (SG) has been designed as a response to the limitations of traditional power grids caused by growing power supply demands. SG is...