Chaos Experiment on EC2 using AWS Fault Injection Simulator (FIS) - Latency Injection

Chaos Experiment on EC2 using AWS Fault Injection Simulator (FIS) - Latency Injection

In the fast-paced world of cloud computing, ensuring the resilience of your applications is crucial. Latency can impact user experience, especially in systems relying on distributed architectures. AWS Fault Injection Simulator (FIS) is an invaluable tool to test how your applications perform under such conditions. In this blog, we’ll explore how to simulate latency injection on EC2 instances using AWS FIS to uncover and address potential vulnerabilities.

What is AWS Fault Injection Simulator?

AWS Fault Injection Simulator is a fully managed chaos engineering service that allows you to stress test applications by injecting faults like latency, CPU load, network disruption, or instance termination. These experiments help developers identify and address weaknesses in a controlled environment, ultimately increasing system reliability.

Use Case: Injecting Latency on EC2 Instances

Latency can originate from network issues, resource contention, or software inefficiencies. Testing how your application reacts to latency on EC2 instances helps ensure that your system gracefully handles such scenarios.

Prerequisites

  1. AWS Account: Ensure you have administrative access.

  2. EC2 Instances: Ensure you have two or more running EC2 instances that are hosting an nginx based web application. You can also check my previous blogs of FIS where I have created this setup from scratch.

  3. Application load balancer that has a target group containing these ec2 instances.

  4. IAM Role: FIS requires an IAM role with specific permissions.

  5. AWS CLI or Console Access: For creating and managing the experiment.

Create an FIS template

Go to AWS FIS console and Click on Create experiment template.

Add a description and name to your template.

Click Next.

For Actions, do the following:

  1. Choose Add action.

  2. Enter a name for the action. For example, enter EC2-Network-Latency-Test.

  3. For Action type, select EC2/SSM and aws:ssm:send-command/AWSFIS-Run-Network-Latency.

  4. For Target keep the target that AWS FIS creates for you.

  5. For Action parameters,

    Document parameters, enter below string:

    •   {"DelayMilliseconds":"200",  "Interface":"eth0", "DurationSeconds":"60",  "InstallDependencies":"True"}
      

Duration, 10 Minutes.

  1. Choose Save.

For Target section, click Edit to edit Instances-Target-1 (aws:ec2:instance)

  • For Name, choose Instances-Target-1

  • For Resource Type, choose aws:ec2:instance.

  • For Target Method, select Resource IDs.

  • For Resource IDs, choose one of the EC2 instance IDs.

  • For Selection mode, choose All which means running the action on all targets.

  • Click save.

You need to add a role for your FIS to run this experiment.

Your experiment template is ready for you to test.

Click on start experiment.

Add a name tag and click on Start experiment.

Now that the experiment is started, you can check the target response time of the loadbalancer.

Navigate to the EC2 Console in your AWS Management Console. From the left-hand menu, select Load Balancers under the Load Balancing section. Locate and select the specific load balancer associated with your application. Once selected, click on the Monitoring tab to access detailed performance metrics.

In the Monitoring section, pay close attention to the Target Response Time metric. This graph provides insights into how long it takes for the load balancer to forward requests to the targets and receive responses. During the chaos experiment, particularly while latency is being injected, you should observe a noticeable spike in the response time graph. This spike indicates the impact of the induced latency on your system’s performance, helping you identify how well your application handles such conditions.

Conclusion

AWS Fault Injection Simulator empowers teams to build resilient systems by exposing hidden vulnerabilities. Injecting latency on EC2 instances is just one example of how chaos engineering can improve system reliability and user experience. By simulating real-world scenarios, you prepare your applications to perform under unexpected conditions.