Service Oriented Enterprise

Tuesday, June 02, 2015

Dominating your career trajectory

Here's a copy of a note I sent out to my team @ VMware about continuous learning and being responsible for your career trajectory.

--------

Team,

I was having a conversation with one of the practice leads today about a potential new hire. I caught myself saying, "The candidate looks like a Chef/Ruby engineer. Are you sure that the person is willing to grow with us?" It led us to a discussion about how we need to equip our team for the next era of computing. With that in mind, I've assembled my list of twenty things that I think our team should know at various levels. Principals should know all of them; consulting architects should know the vast majority, and so on. With that, my list is as follows:

1. Be able to explain map reduce, the implementation choices and use cases.

2. Be able to explain the importance of CAP theorem, and examples of databases that have taken an opinionated approach.

3. Be able to compare and contrast the differences between virtualization and containers.

4. Be able to describe the core elements of Microservices and give examples of implementation technology.

5. Be able to articulate the characteristics and importance of an idempotent service.

6. Be able to explain how SDN and NFV complement and compete.

7. Be able to explain block chains and their importance and usage in distributed systems.

8. Be able to compare and contrast the two major consensus algorithms used in distributed computing and give examples of available implementation libraries.

9. Be able to discuss the features & design tradeoff's of Google Bigtable and Amazon's Dynamo.

10. Be able to describe open compute / open hardware initiatives, their goals and examples of available offerings.

11. Be able to describe the purpose of a fault-injection framework and examples of where and how they have been successfully used.

12. Be able to describe the elastic scale in/out pattern used pervasively in the Amazon cloud.

13. Be able to describe the effects of container technologies as it relates to MTTR and overall high availability.

14. Be able to describe the primary technical components found in a cloud native architecture.

15. Be able to compare and contrast the PaaS offerings from Pivotal, Microsoft and Red Hat.

16. Be able to describe all of the major steps in Continuous Delivery, and most common implementation choices in each step.

17. Be able to discuss the most popular open source licenses and their advantages/disadvantages.

18. Be able to discuss UI design concepts such as Material UI and Responsive Design.

19. Be able to discuss modern engineering methods and architectural approaches in detail (e.g., Scrum, BDD, 12 Factor App, etc.)

20. Be able to discuss the role of distributed computing cluster managers and schedulers, and examples of their usage.

I'll admit that this is a big list for even the most accomplished consultant. The observant reader will notice that the primary themes are distributed computing and modern approaches to software engineering and architecture. As the industry continues its aggressive move toward commodity gear, cheap middleware and fast-fail-fast-recover software, we need to be in a position to advise our clients on new tools, platforms, approaches and the business case. A sea of change awaits our clients and we will be called upon to lead them to the next phase of computing. In some cases our employer will have products that solve the problem, and in many cases we'll need to knit together solutions from the greater hardware & software ecosystem.

Again – this is my list, not the practice leads; I'm confident that they have their own ideas. I wanted to get my view in front of all of you to consider areas for capability development in 2015 and beyond. Of course, many of you have assignments right in front of you that require learning. My recommendation is to be careful not to get lost in the specifics of some product or technology and lose track of the greater change that is occurring. Each of you must dominate your career trajectory. Own it; be its master, not its victim. Learn deep; learn wide; it's not an either/or choice. Rely on your practice leads to help, or me if you feel I can, but ultimately lean on yourself.

Thanks,

Jeff Schneider, Sr. Director Professional Services, DevOps & Open Cloud

Sunday, December 28, 2014

Looking back on my 2014 Cloud Predictions

Here's my self-review of my predictions made one year ago:

=-=-=-=-=-=-=-=-=-=-=-

1. Amazon Web Services will continue to dominate as the leading provider of infrastructure-on-demand services. Like 2013, the number of new offerings will be smaller than previous years as much of the low-hanging fruit has been picked. More emphasis will go into stabilization and features to enable highly reliable computing (across regions).
>> Correct, but this one was too easy.

2. In 2014, we will see OpenStack adoption in the enterprise in a more significant way. This will be both a blessing and a curse. Corporate Infrastructure & Operations teams will be challenged to create stable solutions around OpenStack. The premature release of poor products/code like Ceilometer, Heat and even Neutron will cause unnecessary pain. I&O teams will retreat to the most basic functionality. Those organizations who chose to use the open source bits and not license a commercial product based on OpenStack will rethink their decision. Issues related to a lack of quality and product management will continue to plague OpenStack throughout 2014, leaving the door wide open to VMware.
>> Partially correct. IMHO, OpenStack saw more adoption by service providers while traditional enterprise dipped their toes in the water. And yes, the door was left wide open to VMware.

3. Organizations that were unhappy with the VMware software licensing fees remain unhappy but begin to see it as the 'stable solution'.
>> Correct.

4. The excitement around containers continues to grow in 2013. The Docker model moves beyond early adopters into early majority for dev/test workloads. The way in which Chef/Puppet are used shifts from run-time stack creation to design-time creation, followed by image snapshots. This extends the emphasis on 'idempotent infrastructure'.
>> Correct.

5. As 'idempotent infrastructure' for individual machines begins to feel like a solved problem, the focus shifts to multi-tiered, complex application architectures. A new emphasis is placed on orchestrating full stack solutions regardless of cloud API, hypervisor, operating system and configuration management tool. On-demand, full stack provisioning across dev-to-prod environments becomes doable by your average Joe. That said, TOSCA fails to get any traction due to unnecessary complexity, lack of practical applications and are beat to market by light-weight open source solutions.
>> Correct. We saw a bunch of Docker orchestrators emerge; perhaps too many.

6. More aggressive organizations implement resilience features into their cloud architectures. Systems move beyond the traditional auto-scale and adopt patterns to auto-heal tiers via closed loop monitors that re-provision failed systems and tiers implement run-time service discovery (to reconnect the topology).
>> #FAIL. This was wishful thinking on my part. I need to quit doing that.

7. As resilient architecture become more common, so does the need to test them. Variation of the Simian Army enter the Global 2000; instrumented systems capture data key metrics (MTTR, data loss, etc.) enabling architects to improve the integrity and availability of their solutions.
>> Again, #FAIL.

8. The Hybrid Grid is born. Unlike prior grid emphasis, the focus is on long-running services (not batch jobs). Unlike Hybrid Clouds, the focus is on uniform containers (think LXC/Docker) and resource schedulers that are "service first" not "machine first" enabling the grid controller to offer reputable Service Level Agreements. In 2014, Hybrid Grids gain traction in the high-tech web-scale shops but remain out of reach for most large enterprises. A few small service providers will begin to offer Hybrid Grid as-a-Service.
>> Partially correct. The rise of containers on Amazon and Google Kubernetes or managed containers is pushing this forward.

9. As we closed out 2013, the Target credit card breach was front-page news. Concerns around security and compliance are increasing in virtually every industry. Removing manual processes is seen as an easy way to implement tighter compliance and security. In 2014, companies will implement sandboxes within their cloud that focus on specific problems like PCI compliance. The DevOps processes will mandate that software 'run the gauntlet' (network segmentation, anti-virus, code inspections, encryption, etc.) to provide a safer environment. More security professionals will singing the praises of the cloud as a means to automate inspections, lock-down environments and provide audit trails.
>> #FAIL. I was thinking we'd see much more SecOps or RuggedOps - but this movement is still early.

10. I anticipate that 2014 will be the year of "cloud in a box", where converged infrastructure solutions and cloud stacks are pre-packaged into turn-key rack/row clouds. Stripped down versions of OpenStack will be the preferred controller. We should expect the usual hardware vendors (Dell, HP, Cisco, EMC, etc.) to offer their own brand, as well as to offer hybrid solutions that leverage hardened cloud software from Red Hat, Canonical and others. Companies that once looked at vBlock or FlexPod open up their wallet. Organizations that avoided converged infrastructure continue to avoid it - but create their own reference architectures for their home-grown kits.
>> #FAIL. Damn. This should have happened by now ... maybe one of my friends at HP, Dell or IBM can comment on why it's taking so long.

=-=-=-
Alright - I didn't do so hot in my 2014 predictions. Most of the stuff I still agree with - but we'll likely have to wait for 2015 or 2016. Here's to being patient!
Jeff

Sunday, March 30, 2014

An FAQ on Cloud Consulting Companies

As CEO of MomentumSI, I get asked lots of questions about cloud consulting. Here's my attempt to answer some of the most Frequently Asked Questions!

How do you classify cloud consulting companies?

The first dimension is 'public cloud' or 'private cloud' (behind the firewall). The second dimension is the technology layer at which they work (IaaS, PaaS or SaaS). The third dimension is the speciality within the layer. For example, a services company might only work in 'public cloud at the SaaS layer' - but specialize in 'sales and marketing' solutions. The fourth dimension is the type of services that they provide. Some companies only do 'strategy and planning', while others will do 'training and mentoring', 'implementation' or 'managed cloud services'.

From a consulting perspective, which clouds are getting the most traction?

There are lots of clouds across the aforementioned dimensions. For IaaS + public cloud, Amazon is clearly in the lead. For IaaS + private cloud - the company that us selling the most deals is VMware - but the media darling / thought leader is OpenStack with an early lead going to the Red Hat distribution. On the PaaS side, I can't really give anyone credit for dominating the market. From a public PaaS perspective, you'd have to acknowledge Force.com + Heroku (Salesforce.com). On the private side, Cloud Foundry seems like an early thought leader - but it's way too early to tell. At the SaaS layer, we see lots of Salesforce.com, Workday, Microsoft (apps) and Google (apps). There are a ton of others getting traction - too many to mention.

Who are the boutique cloud consulting companies?

I have to mention my company first, MomentumSI. Our strategy was simple: initially, focus on public cloud (AWS / Google) with an emphasis on automation (think DevOps and Cloud Management activities: ServiceMesh, vCAC, Puppet, Chef, RunDeck, Docker, Ansible etc.) Recently, we're seeing an increased demand for OpenStack and VMware private cloud implementations.

Other notable boutiques include: Mirantis (Russian firm with core engineering expertise on OpenStack), Appirio / Bluewolf (focused on Salesforce.com), Cloud Sherpas (focused on Google Apps), 2nd Watch and Datapipe (focused on AWS Managed Services).

Are the offshore I.T. companies involved in cloud?

Yes - but many of them are very early. Companies like Infosys, Cognizant, TCS, Wipro, HCL and EPAM were hired to develop enterprise software, maintain and operate it. They are being asked to help with lift and shift migration efforts - and in some cases, refactoring the applications to work better in the cloud. They focus on migrating 10's or 100's of applications for a single customer. This market is still early.

Are the Very Large consulting companies doing cloud work?

Yes and no. Many of them are doing true cloud consulting while others are classifying last-generation stuff (e.g., virtualization, hosting) work as 'cloud'. Traditionally, these companies are most commonly used for strategy and roadmap services. CSC, Dell Services and others have made some aggressive moves to build their offering but mostly focused building out their cloud, not on the consulting side. Capgemini made an early push but it's not clear to me if it has turned into a large piece of their business. IBM's acquisition of SoftLayer and subsequent push on BlueMix implies that they're playing to win. I'm confident that Accenture has done something - but I lack visibility into their efforts.

Are the hardware vendors doing cloud consulting?

I've witnessed Dell getting some interest in their OpenStack offering. I'd speculate that HP public sector will have some consulting drag-along related to data center modernization efforts - and eventually will see opportunities generated from Moonshot and USP. Both EMC and Cisco have access to a wonderful roster of infrastructure buyers. I've seen EMC in some interesting consulting deals; Cisco not so much. Hitachi, Fujitsu and other specialty shops are interested in breaking out of their molds but will likely require a more full-fledged transformation to make the jump.

Who is leading in public sector cloud consulting?

Despite the fact that I live outside of Washington D.C., I don't follow SLED/Fed. From what little I've actually witnessed, Booz-Allen has had some early wins. The larger defense contractors (Boeing, LMCO, Northrop, etc.) are playing their usual risk averse role. I don't have enough visibility to comment on SAIC, CACI, CGI, etc. ... well, maybe I could comment on CGI ;-)

What are the data center service and hosting companies doing?

Sungard has made a focused attempt to build out their cloud offerings including vendor neutral consulting. Unisys and Xerox/ACS both made an initial push - but it's unclear to me where the current offerings stand. Many of the traditional hosting companies like Rackspace, Verizon, AT&T and CenturyLink have built out their managed service offerings and will provide 'workload migration implementation'. Generally speaking, most companies in this space are treating cloud like the fundamental threat that it is. They are creating inventive offerings that embrace new partners and approaches. In my opinion, not all of them will make the transition. Those vendors who don't adapt to the customer buying process will be most challenged. Failure to provide unbiased consulting, no hybrid cloud offering, etc. will push customers away.

What are the VAR's (Value Add Resellers) doing in the cloud?

The VAR's are often the first point of contact for purchasing hardware and software. They realize that if those purchases move to a cloud based model, the potential for disintermediation exists. Several VAR's have begun reselling cloud offerings like it was just another SKU. They leverage their large call centers, shopping portals and existing procurement vehicles/contracts to execute high volume, low-margin transactions. Generally speaking, the VAR's have partnered with focused consulting companies to perform the implementation work.

Warning: This is just one opinion. If you need more data, I'd recommend the analysis of Lydia Leong (Gartner), Carl Brooks (451 Group) and James Staten (Forrester); they've all been following the cloud space since the inception. The CloudCast and the GigaOM Cloud sites are also very insightful. Visit MomentumSI for more information on our services.

Wednesday, January 01, 2014

2014 Cloud Predictions

Going into 2014, cloud computing (IaaS/PaaS) is the defacto model for start-ups and SaaS providers. It continues to gain acceptance in large organizations but remains nascent. Looking forward, I anticipate the following:

1. Amazon Web Services will continue to dominate as the leading provider of infrastructure-on-demand services. Like 2013, the number of new offerings will be smaller than previous years as much of the low-hanging fruit has been picked. More emphasis will go into stabilization and features to enable highly reliable computing (across regions).

2. In 2014, we will see OpenStack adoption in the enterprise in a more significant way. This will be both a blessing and a curse. Corporate Infrastructure & Operations teams will be challenged to create stable solutions around OpenStack. The premature release of poor products/code like Ceilometer, Heat and even Neutron will cause unnecessary pain. I&O teams will retreat to the most basic functionality. Those organizations who chose to use the open source bits and not license a commercial product based on OpenStack will rethink their decision. Issues related to a lack of quality and product management will continue to plague OpenStack throughout 2014, leaving the door wide open to VMware.

3. Organizations that were unhappy with the VMware software licensing fees remain unhappy but begin to see it as the 'stable solution'.

4. The excitement around containers continues to grow in 2013. The Docker model moves beyond early adopters into early majority for dev/test workloads. The way in which Chef/Puppet are used shifts from run-time stack creation to design-time creation, followed by image snapshots. This extends the emphasis on 'idempotent infrastructure'.

5. As 'idempotent infrastructure' for individual machines begins to feel like a solved problem, the focus shifts to multi-tiered, complex application architectures. A new emphasis is placed on orchestrating full stack solutions regardless of cloud API, hypervisor, operating system and configuration management tool. On-demand, full stack provisioning across dev-to-prod environments becomes doable by your average Joe. That said, TOSCA fails to get any traction due to unnecessary complexity, lack of practical applications and are beat to market by light-weight open source solutions.

6. More aggressive organizations implement resilience features into their cloud architectures. Systems move beyond the traditional auto-scale and adopt patterns to auto-heal tiers via closed loop monitors that re-provision failed systems and tiers implement run-time service discovery (to reconnect the topology).

7. As resilient architecture become more common, so does the need to test them. Variation of the Simian Army enter the Global 2000; instrumented systems capture data key metrics (MTTR, data loss, etc.) enabling architects to improve the integrity and availability of their solutions.

8. The Hybrid Grid is born. Unlike prior grid emphasis, the focus is on long-running services (not batch jobs). Unlike Hybrid Clouds, the focus is on uniform containers (think LXC/Docker) and resource schedulers that are "service first" not "machine first" enabling the grid controller to offer reputable Service Level Agreements. In 2014, Hybrid Grids gain traction in the high-tech web-scale shops but remain out of reach for most large enterprises. A few small service providers will begin to offer Hybrid Grid as-a-Service.

9. As we closed out 2013, the Target credit card breach was front-page news. Concerns around security and compliance are increasing in virtually every industry. Removing manual processes is seen as an easy way to implement tighter compliance and security. In 2014, companies will implement sandboxes within their cloud that focus on specific problems like PCI compliance. The DevOps processes will mandate that software 'run the gauntlet' (network segmentation, anti-virus, code inspections, encryption, etc.) to provide a safer environment. More security professionals will singing the praises of the cloud as a means to automate inspections, lock-down environments and provide audit trails.

10. I anticipate that 2014 will be the year of "cloud in a box", where converged infrastructure solutions and cloud stacks are pre-packaged into turn-key rack/row clouds. Stripped down versions of OpenStack will be the preferred controller. We should expect the usual hardware vendors (Dell, HP, Cisco, EMC, etc.) to offer their own brand, as well as to offer hybrid solutions that leverage hardened cloud software from Red Hat, Canonical and others. Companies that once looked at vBlock or FlexPod open up their wallet. Organizations that avoided converged infrastructure continue to avoid it - but create their own reference architectures for their home-grown kits.

In summary, we should expect the next set of adopters to hop on the wagon. They'll be frustrated by the amount of change and immaturity of solutions. Ultimately, they'll go back to the vendors they know and trust to help them through the pain. The patterns and practices are maturing - but the types of problems that we're throwing at the cloud are becoming more complex.

The cloud is here; there's no going back.
Happy 2014.

Tuesday, December 31, 2013

Looking back on my 2013 predictions

Last year, I made some predictions on cloud computing. Here's my self-analysis:

===================
1. OpenStack continues to gain traction but many early adopters bypass Folsom in anticipation of Grizzly.
>> Correct. This was a gimme.

2. Amazon's push to the enterprise means we will see more hosted, packaged apps from Microsoft, SAP and other large ISV's. Their IaaS/PaaS introductions will be lackluster compared to previous years.
>> Correct. It's interesting that the press failed to notice the lack of interesting stuff coming out of AWS. Has the law of diminishing returns already hit Amazon?

3. BMC and CA will acquire their way into the cloud.
>> Incorrect. CA picked up Nolio (and Layer 7), BMC acquired Partnerpedia. These acquisitions are pieces to the puzzle - but are not large enough to serve as anchors for a cloud portfolio.

4. SAP Hana will quickly determine that Teradata isn't their primary competitor as the rise of OSS solutions matures.
>> Incorrect. SAP Hana continued to kick butt in 2013 and the buyers of it have probably never heard of the large open source databases. What was I thinking?

5. Data service layers (think Netflix/Cassandra) become common in large cloud deployments.
>> Partially Correct. We're seeing the cloud-savvy companies implement cross-region data replication strategies - but the average enterprise is nowhere near this.

6. Rackspace, the "Open Cloud Company" continues to gain traction but users find more and more of their services 'not open'.
>> Correct. Rackspace continues to push a 'partially open' agenda - but users seem to be more than happy with their strategy.

7. IBM goes another year without a cohesive cloud strategy.
>> Correct. The acquisition of SoftLayer was a huge step forward in having a strategy - but from the outside looking in, they still look like a mess.

8. Puppet and Chef continue to grow presence but Cfengine gets a resurgence in mindshare.
>> Partially Correct. Puppet and Chef did grow their presence, especially in the large enterprise. I could be wrong, but I personally didn't see Cfengine get traction. That said, Ansible and Salt came out strong.

9. Cloud Bees, Rightscale, Canonical, Inktank, Enstratus, Piston Cloud, PagerDuty, Nebula and Gigaspaces are all acquired.
>> Incorrect. I was right about Enstratus but some of these predictions were stupid (like Canonical). The others remain strong candidates for acquisition.

10. Eucalyptus sunsets native storage solutions and adopts OpenStack solutions.
>> Unsure; I don't keep track of Eucalyptus.

11. VMware solution dominates over other CloudFoundry vendors.
>> Correct. I was referring to what is now called Pivotal.

12. Cloud 'cost control' vendors (Newvem, Cloudyn, Cloud Cruiser, Amysta, Cloudability, Raveld, CloudCheckR, Teevity, etc.) find the space too crowded and begin shifting focus.
>> Correct. Some of them have moved into adjacent spaces like governance, billing, etc.

13. PaaS solutions begin to look more and more like orchestration solutions with capabilities to leverages SDN, provisioned IOPS, IAM and autonomic features. Middleware vendors that don't offer open source solutions lose significant market share in cloud.
>> Incorrect. I believe this is still coming but for the most part the vendors aren't there.

14. Microsoft's server-side OS refresh opens the door to more HyperV and private cloud.
>> Unsure. This should have happened but I have no data.

15. Microsoft, Amazon and Google pull away from the pack in the public cloud while Dell, HP, AT&T and others grow their footprint but suffer growing pains (aka, outages).
>> Correct. Well - at least the part where AWS, Azure and Google pull away from the pack. Dell continues to frustrate me; I need to have a sit-down with Michael Dell.

16. Netflix funds and spins out a cloud automation company.
>> Incorrect. Perhaps this was wishful thinking. I'm a Netflix OSS fanboy - but think that they're starting to fall into the same trap as OpenStack (aka, open sourcing the kitchen sink without strong product/portfolio management).

17. Red Hat focuses on the basics, mainly integrating/extending existing product lines with a continued emphasis on OpenStack.
>> Correct. Red Hat appears to be taking a risk averse strategy... slow but methodical movement.

18. Accenture remains largely absent from the cloud, leaving Capgemini and major off-shore companies to take the revenue lead.
>> Unsure. I'm unaware of any large movements that Accenture made in the cloud. The big move in the SI space was CSC acquiring ServiceMesh.

19. EMC will continue to thrive: it's even easier to be sloppy with storage usage in the cloud and users realize it isn't 'all commodity hardware'.
>> Correct. That said, we're starting to see companies implement multi-petabyte storage archival projects with cloud companies.

20. In 2013, we'll see another talent war. It won't be as bad as dot-com, but talent will be tight.
>> Correct. And it will get worse in 2014.

Thursday, April 25, 2013

New Presentations: SOA, DevOps and Technical Debt

MomentumSI recently published a series of presentations on hot topics in I.T.

DevOps in 2013 covers the current state of I.T. operations automation and the issues in the SDLC that need to be addressed in order to achieve continuous delivery:

By now, most I.T. professionals are familiar with "technical debt". This presentation encourages practitioners to think about the structural issues that slow us down:

A lot has changed in the SOA world over the last few years. However, we continue to see many organizations adopting techniques that don't promote agility:

Thursday, January 03, 2013

ITIL and DevOps: Inbreeding?

The 2012 Christmas Eve outage at Amazon has people talking. The fuss isn't about what broke; it's about what Amazon said they're going to do to fix it. If you aren't familiar with their report, it's worth a quick read. If it's tl;dr, I'll sum it up: a developer whacked some data in a production database that made the load balancing service go hay-wire, and it took longer than it should have to identify the problem and restore it. (did you see how i avoided the technical jargon??)

If you're Amazon, you have to start thinking about how to make sure it never happens again. Restore confidence... and fast. Here's what they said:

We have made a number of changes to protect the ELB service from this sort of disruption in the future. First, we have modified the access controls on our production ELB state data to prevent inadvertent modification without specific Change Management (CM) approval. Normally, we protect our production service data with non-permissive access control policies that prevent all access to production data. The ELB service had authorized additional access for a small number of developers to allow them to execute operational processes that are currently being automated. This access was incorrectly set to be persistent rather than requiring a per access approval. We have reverted this incorrect configuration and all access to production ELB data will require a per-incident CM approval. This would have prevented the ELB state data from being deleted in this event. This is a protection that we use across all of our services that has prevented this sort of problem in the past, but was not appropriately enabled for this ELB state data. We have also modified our data recovery process to reflect the learning we went through in this event. We are confident that we could recover ELB state data in a similar event significantly faster (if necessary) for any future operational event. We will also incorporate our learning from this event into our service architecture. We believe that we can reprogram our ELB control plane workflows to more thoughtfully reconcile the central service data with the current load balancer state. This would allow the service to recover automatically from logical data loss or corruption without needing manual data restoration.

Here's my question: If ITIL Service Transition (thoughtful change management) and DevOps (agile processes with infrastructure-as-code were to mate, what would the outcome be?
A) A child that wanted to run fast but couldn't because of too many manual/approval steps
B) A child that ran fast but only after the change board approved it
C) Mate multiple times; some children will run fast (with scissors) others will move carefully
D) No mating required; just fix the architecture (service recovery)

This is the discussion that I'm having with my colleagues. And to be clear, we aren't talking about what Amazon could/should do, we're talking about what WE should do with our own projects.

Although there's no unanimous agreement there has been some common beliefs:
1. Fix the architecture. I like to say that "cloud providers make their architecture highly available so we don't have to." This is an exaggeration, but if the cloud provider does their job right, we will have to focus less on making our application components HA and more about correctly using the providers HA components. There's little disagreement on this topic. AWS screwed up the MTTR on the ELB. We've all screwed up things before... just fix it.

2. Rescind dev-team access. So this is where it gets interesting. Remember all that Kumbaya between developers and operators? Gone. Oh shit - maybe we should have called the movement "DevTestOps"! One simple mistake and you pulled my access to production?? LOL - hell, yea. The fact is all services aren't created equal. I have no visibility into Amazon's internal target SLA's - but I'm going to guess that there are a few services that are five-9's (or 5.26 minutes of down-time per year). Certain BUSINESS CRITICAL services shouldn't be working in DevOps time. They should be thoughtfully planned out with Change Advisory Boards with Change Records and Release Windows by pre-approved Change Roles. Yes - if it's BUSINESS CRITICAL - pull out your ITIL manuals and follow the !*@$ing steps!

Again - there's little disagreement here. People who run highly available architectures know that to re-release something critical requires a special attention to detail. Run the playbook like your launching a nuclear missile: focus on the details.

To be clear, I love infrastructure-as-code. I think everything can be automated and it kills me to think about putting manual steps into tasks that we all know should run human-free. If your application is two-9's (3.6 days of down-time), automate it! Hell, give the developers access to production data - - you can fix it later! What about 99.9% uptime (8.76 hours)? Hmm... not so sure. What about 99.99% up-time? (52.56 minutes)? Well, that's not a lot of time to fix things if they go wrong. But wait - if I did DevOps automation correctly, shouldn't I be able to back out quickly? The answer is Yes - you SHOULD be able to run your SaveMyAss.py script and it MIGHT work.

Ponder this:
Dev-to-Test = Use traditional DevOps & IaC (Infrastructure as Code)
Test-to-Stage = (same as above)
Stage-to-Prod (version 1) = (same as above)
Patch-Prod (99% up-time or less) = (same as above)
Patch-Prod (99.9% or greater up-time) = Run your ITIL checklist. Use your IaC scripts if you got'em.

For me, it's not an either/or choice between ITIL Transition Management and DevOps. IMHO, both have a time and a place. That said, I don't think that the answer is to inbreed the two - DevOps will get fat and be the loser in that battle. Keep agile agile. Use structure when you need it.