Home>

I'm a Rails novice engineer.
We are planning to deploy and operate a web application prototype on AWS, and are building a production environment.

I want to achieve

In order to maintain the availability of web applications on EC2 (maintain one instance), I want to set AWS EC2 Auto Scaling to automatically recover the server.

Implementation content
  1. Create an AMI from an instance (instance type: t2.micro, storage: 8G) set up on EC2
  2. Create a startup setting using the AMI created in 1 above (settings such as instance type, storage, security group, etc. are the same as 1. Since I want to SSH login to the automatically started instance with the same key, select an existing key pair. )
  3. Create an Auto Scaling group using the launch settings in 2 above

Auto Scaling group settings

-Group size: Desired capacity 1, minimum capacity 1, maximum capacity 1
-Subnet: Select the same as the existing EC2 instance
-Load balancing: Check and select the target group
・ Health check type: EC2&ELB
・ Health check grace period: 300
· Instance scale-in protection: From unprotected scale-in
-Exit policy: Default
· Default cooldown: 300
Occurrence event

Even though there is no problem with the EC2 instance (capture start-aws-instance), Auto Scalling repeats the automatic creation and termination of the instance about every 8 minutes.

Instance list

ELB target list (Auto Scaling health check is not healthy)

Activity history of Auto Scaling group (instance startup/termination is repeated about every 8 minutes)

Confirmation result of port status

Netid State Recv-Q Send-Q Local Address: Port Peer Address: Port
udp UNCONN 0 0 127.0.0.1:323 0.0.0.0: *
udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:*
udp UNCONN 0 0 0.0.0.0:111 0.0.0.0: *
udp UNCONN 0 0 0.0.0.0:728 0.0.0.0:*
udp UNCONN 0 0 [:: 1]: 323 [::]: *
udp UNCONN 0 0 [fe80 :: 89d: beff: fed1: a24]% eth0: 546 [::]: *
udp UNCONN 0 0 [::]: 111 [::]: *
udp UNCONN 0 0 [::]: 728 [::]: *
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0: *
tcp LISTEN 0 100 127.0.0.1:25 0.0.0.0: *
tcp LISTEN 0 128 0.0.0.0:111 0.0.0.0: *
tcp ESTAB 0 36 172.31.11.230:22 60.60.230.194: 51511
tcp LISTEN 0 128 [::]: 22 [::]: *
tcp LISTEN 0 128 [::]: 111 [::]: *
Netid State Recv-Q Send-Q Local Address: Port Peer Address: Port
udp UNCONN 0 0 127.0.0.1:323 0.0.0.0: *
udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:*udp UNCONN 0 0 0.0.0.0:111 0.0.0.0: *
udp UNCONN 0 0 0.0.0.0:727 0.0.0.0:*
udp UNCONN 0 0 [:: 1]: 323 [::]: *
udp UNCONN 0 0 [fe80 :: 884: abff: fe94: f3e4]% eth0: 546 [::]: *
udp UNCONN 0 0 [::]: 111 [::]: *
udp UNCONN 0 0 [::]: 727 [::]: *
tcp LISTEN 0 128 0.0.0.0:111 0.0.0.0: *
tcp LISTEN 0 128 0.0.0.0:80 0.0.0.0: *
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0: *
tcp LISTEN 0 100 127.0.0.1:25 0.0.0.0: *
tcp TIME-WAIT 0 0 172.31.14.50: 80 172.31.28.107: 60516
tcp ESTAB 0 36 172.31.14.50:22 60.60.230.194: 51517
tcp TIME-WAIT 0 0 172.31.14.50: 80 172.31.6.186: 34352
tcp TIME-WAIT 0 0 172.31.14.50: 80 172.31.28.107: 60502
tcp TIME-WAIT 0 0 172.31.14.50: 80 172.31.6.186: 34362
tcp TIME-WAIT 0 0 172.31.14.50: 80 172.31.6.186: 34376
tcp LISTEN 0 128 [::]: 111 [::]: *
tcp LISTEN 0 128 [::]: 80 [::]: *
tcp LISTEN 0 128 [::]: 22 [::]: *
Question

We assume that the health check of Auto Scaling is not done well as the cause of the instance crowd, but we are in trouble because we do not know the details.
I would like Auto Scaling to automatically recover one server only when there is a problem with the EC2 instance (capture start-aws-instance), but I would like advice on the cause of the event and how to deal with it. Thank you.

  • Answer # 1

    Since tanat's answer says almost everything to check, I will write about health checks.

    As for what the health check is doing, the health check sends a request from ELB to the specified path of the private IP of the EC2 instance and confirms that the response code is the specified one. I have. (By default, send a request to port 80 and200 OKI'm seeing if it comes back)
    This can be customized in the target group settings.

    SoWhen the ELB sends a request to an EC2 instance, the health check will fail unless the EC2 is set to return the expected response code...
    The cause of the failure is that the EC2 side is not ready to accept requests, the security group does not allow communication from ELB, or the EC2 side redirects the request addressed to port 80 (redirect is 300). (Because it's a code in the series) ... There are various things, but please check it out.

    A less recommended method is to temporarily change the response code that considers the health check OK or expand the range, but this is only done during debugging, and when you actually run it, do this. If you do, the health check will be meaningless.

  • Answer # 2

    We assume that the health check of Auto Scaling is not done well as the cause of the instance crowd, but we are in trouble because we do not know the details.

    First of all
    How do I troubleshoot an Application Load Balancer and fix a health check failure?
    Please try to separate it with reference to.

    Personally, instead of suddenly creating an AutoScalingGroup with a web application running,

    First, create a set of ELB and EC2 that only displays static HTML so that you can clear the health check (Apache and nginx are installed, and prepare an instance of a web server that only displays static HTML. . Make ELB separately from what you are touching now)

    Let's create an Auto Scaing Group with 1 AMI and ELB

    Once you have 2, try creating an AutoScalingGroup that targets static HTML as well as 2 and make it work with the AMI used in the question.

    Once you have 3, create an environment where you can perform dynamic health checks (which you may be aiming for during the question).

    I think the shortcut is to take these steps and figure out how to separate them in order.