March 29, 2024

GHBellaVista

Imagination at work

Do Staging Procedures Need a Rethink?

FavoriteLoadingAdd to favorites

“Has any one began owning discussions with their CIO/CEO about relocating back again to an in-property mail server? I advocate for it”

Provided the scale of its user base and with a deal truly worth up to $ten billion in the bag to run the back again-conclusion of a superpower’s armed service, Microsoft could want to start out wondering about how it can create a staging technique for its Azure cloud that will allow it to deploy changes and reliably roll back again individuals changes when items split.

(We know, it is uncomplicated to say from a harmless distance…)

Redmond was at it all over again late Monday, knocking an (seemingly substantial) “subset of prospects in the Azure Community and Azure Government clouds” offline for 3 several hours with swathes of people globally encountering mistakes executing authentication operations multiple solutions have been afflicted, including Microsoft 365.

The enterprise blamed the problem on a “recent configuration alter [that] impacted a backend storage layer, which brought on latency to authentication requests.” (Study, people couldn’t login to their solutions for several hours simply because of the snafu).

A full root trigger investigation is pending. (We will update this piece when we see it). The blockage was felt for people from 22:25 BST on Sep 28 2020 to 01:23 BST.

The problem comes a fortnight immediately after a protracted outage in Microsoft’s Uk South location activated by a cooling program failure in a facts centre. With temperatures mounting, automatic units shut down all network, compute, and storage resources “to safeguard facts durability” as engineers rushed to consider manual regulate.

Previously this month in the meantime Gartner reported it “continues to have fears associated to the in general architecture and implementation of Azure, despite resilience-focused engineering efforts and improved assistance availability metrics during the earlier year”.

Microsoft Azure CTO Mark Russinovich in July 2019 reported that Azure experienced fashioned a new Excellent Engineering group within just his CTO workplace, doing work along with Microsoft’s Website Trustworthiness Engineering (SRE) group to “pioneer new techniques to produce an even extra trusted platform” adhering to client problem at a string of outages.

He wrote at the time: “Outages and other assistance incidents are a problem for all community cloud companies, and we go on to boost our being familiar with of the complex strategies in which aspects this sort of as operational processes, architectural styles, components challenges, software flaws, and human aspects can align to trigger assistance incidents.

“Has any one began owning discussions with their CIO/CEO about relocating back again to an in-property mail server? I advocate for it” a person annoyed user noted on a global Outages mailing checklist meanwhile… If cloud is your compressed audio stream that you are not absolutely sure you possess, it might not be extended right before in-property mail servers develop into the vintage quality vinyl of the IT planet outdated, but pretty significantly back again in desire.

Stranger items have took place.