AWS explains outages and makes it easy to track future problems

news7g12/11/2021

5 2 minutes read

AWS explains outages and makes it easy to track future problems

Amazon Web Services CEO Adam Selipsky delivers a keynote during the AWS re:Invent conference in Las Vegas on November 30, 2021.

Noah Berger | beautiful pictures

Amazon Web Services on Friday released an explanation for an hour-long outage earlier this week that disrupted retail business and third-party online services. The company also said it plans to improve its status page.

Issues in Amazon’s large data center East America-1 area in Virginia began at 10:30 a.m. ET on Tuesday, the company said.

Company written in a post on its website. As a result, devices connected to Amazon’s intranet and AWS’s network become overloaded.

Several AWS engines are affected, including the widely used EC2 service that provides virtual server capacity. AWS engineers worked to resolve the issues and get service back up and running in the next few hours. The EventBridge service, which can help software developers build apps that take action in response to certain activities, didn’t fully resume until 9:40 p.m. ET.

Downtime can affect the perception that the cloud infrastructure is reliable and ready to handle the migration of applications from physical data centers. It can also have major impacts on businesses. AWS has millions of customers and is leading provider on the market.

AWS has apologized for the impact the outage has had on its customers.

Popular websites and heavily used services have been taken offline, including Disney+, Netflix, and Ticketmaster. Roomba vacuums, Amazon’s Ring security cameras, and other internet-connected devices like smart cat litter boxes and app-connected ceiling fans were also taken down due to outages.

Amazon’s own retail operations have come to a standstill in a number of Internal US applications used by Amazon’s warehouses and delivery force that depend on AWS, so most of the day’s employees Tuesday was unable to scan packages or access delivery routes. Third-party sellers also cannot access the website used to manage customer orders.

During the outage, AWS tried to keep customers aware of what was going on, but the cloud was having trouble updating. status page, called Service Health Dashboard.

“Since the impact on services during this event all stemmed from a single underlying cause, we’ve chosen to provide updates via a global banner on the Service Health Dashboard. “AWS said.

Additionally, customers were unable to create a support request for seven hours during the outage.

AWS says it is now taking action to address both of those issues.

“We expect to release a new version of our Service Health Dashboard early next year that will make it easier to understand the impact of the new service and support system architecture. actively running across multiple AWS regions to ensure we have zero delays in communicating with our customers,” said AWS.

This isn’t the first time AWS has changed the way issues are reported.

In 2017, an outage occurred with the popular AWS S3 storage service that prevented engineers from displaying the right color to indicate uptime on the Service Health Dashboard. Amazon posted banners and took to Twitter to release the new information.

“We changed the SHD admin console to run across multiple AWS regions,” Amazon said in a statement a message about that episode.

CLOCK: That Week: Amazon Web Services Crashes

.

Source link