Incidents
This section contains documentation on any down incidents that occurred during the lifetime of the project.
Load Balancer Incident (21/09/2023)
Timeline
- The development team was made aware of the issue on September 21st through an email. The LHIN team said that the server was giving a
503 Service Temporarily Unavailable
message. - The team started working on a fix on Friday but could not figure out the issue.
- Resources were not available for the team to work on the issue on Monday.
- The team was able to fix the issue on Tuesday and deployed the fix the same day.
- The LHIN team confirmed the fix on September 28th.
Issue Identification and Fix
The issue was with the load balancer which had had its target group changed for some unknown reason. The elastic beanstalk server which hosts the production environment of this project was not in the target group. The issue was identified since the server could be accessed at its AWS URL but not at is GoDaddy URL. To fix the issue the target group was changed to include the server.
Lessons Learned
Issues that made it hard to identify a fix:
- The lack of any documentation on the AWS resources that the project employs.
To fix this a new repository was started to document all the AWS resources that the team has. - The lack of expertise with AWS within the team. This is still unresolved.