DevOps学习文档

DevOps can mean different things to different people and this can make it a difficult topic to explore. Here we will give a quick overview of the origins of DevOps, discuss some predecessors and examine what DevOps practitioners can hope to achieve.

DevOps is widely regarded as a natural progression of agile development practices and lean principles. As the values from both were applied to software development and project management, it eventually became apparent that the same values could be applied to IT operations.

Toyota’s influence on automotive manufacturing can’t be overlooked. Their idea of “improvement kata” gave way to the idea of continual improvement of daily work, among other things. As the ideas of Lean were further refined, the Lean Enterprise Institute was founded and they arrived at what they refer to as the 5 key principles of Lean: value, value stream, flow, pull, and perfection.

While the values behind the agile movement aren’t new, they weren’t really codified until a group of developers gathered at The Lodge at Snowbird in Utah to put their thoughts together and create what is The Agile Manifesto. The Manifesto is a lightweight set of values & principles credited with dramatically increasing productivity within software development organizations.

This gathering helped spawn the dozens of agile tools, methodologies and frameworks that exist today. Some of the more popular ones include Scrum, Kanban, Scaled Agile Framework® (SAFe®), and Extreme Programming (XP). Some are geared more toward project management and others are focused on developers and engineers, but these methodologies collectively aim to provide a framework for highly effective and efficient software development. The idea of incorporating iteration and continuous feedback into software development provides a vehicle to successively refine and deliver software. At some level, the methodologies involve continuous planning, testing, and integration as a means of continually evolve software projects and practices.

This talk at O’Reilly’s Velocity Conference is considered to be the initial genesis of concepts behind DevOps. Later in 2009, the first devopsdays event was held in Ghent, Belgium which further helped establish the term. As discussions on Twitter progressed, the #DevOpsDays hashtag was eventually shortened to #DevOps. A movement was born!

In order to be competitive in the marketplace, organizations are looking for ways to improve deployment frequency, achieve faster time to market, minimize issues with new releases and shorten lead time between releases. Investing in and implementing a DevOps strategy across your organization can help you reach those goals.

It's not all about the stakeholders, though. Developers value fast feedback, working iteratively and improving continuously. They want frameworks that enable seamless communication with stakeholders throughout all phases of a software development project, from planning to development to delivery. A successful adoption of DevOps practices can help achieve these goals.

Like the Agile Manifesto - which expresses values in terms of “this over that” - the practice of DevOps has roots firmly planted in a specific set of cultural ideals. Many of those ideals are very similar to those in the agile community. When you take agile practices and apply them post-development, you begin to build cultures that value and enable increased collaboration, shared knowledge and responsibility, autonomous teams, improved quality, frequent feedback and increased automation. Tools that support this kind of culture generally include configuration management systems, automated test systems, build/compilation systems, deployment tools, virtual infrastructure and monitoring systems.

Part of continuous improvement is introspection. How can we apply agile & lean principles to deployment and maintenance of the software?

How do your project teams get their code off of their laptops and onto servers, or onto the desktops of testers? Think about how many steps it takes. Are all of those steps necessary? Are any steps missing? Introspection of your processes goes a long way here.

We can’t overstate the need to automate repeatable processes. There is no real business value in having your engineers manually deploy code, configure servers or perform other housekeeping operations every time there is a new release for testing or production. Your developers should be concentrating on the hard problems! What about getting a server for your web application? How many weeks - or months - does that take?

Humans will make mistakes. It happens. So why do organizations insist on having large teams of people that do nothing but manually provision environments for software projects? How many times has a server been configured only to realize that the associated software doesn’t "work quite right?" Computers are very capable of handling repetitive tasks the same way every time. This is where Continuous Integration and Continuous Delivery come in. Also, Virtual Machines, Containers and Infrastructure as Code provide better automated ways to manage your computing infrastructure without needing constant human intervention.

DevOps Family Values & The Three Ways

How do we answer how long it will take to add a new feature, fix a bug or change a workflow in an application? We often provide answers in terms of how long it takes to complete the development work alone, all while ignoring the total time required to complete the request before and after the actual "work" takes place.

In the Lean Value Stream Map, Process Time (PT) is the time it takes to do the work. Lead Time (LT) includes the Process Time and everything else outside of that for that step. In the case of software development this can include time for ideation, QA testing, deployment time, etc.

Deployment lead time at a typical corporation can be weeks or even months. This is because deployment is often more complex than just dropping some code onto a server or running an upgrade overnight. We are now forced to consider the time “required” to provision a server, or cutover into a new environment, or meet some other requirement that is necessary for deployment in the infrastructure's current state.

Have you ever found yourself asking “Why does it take so long to get a server?” I have been asking that question for over a decade. Every time I go to a customer that builds applications for large teams deployment is a large undertaking. It seems like the wheel keeps getting reinvented each time. (We'll cover some tools that help teams solve these problems in a future article.)

A Common Scenario: Multi-Month Deployment Lead Times

Many large organizations support monolithic apps that rely heavily on manual testing. These apps often have minimal test environments that require a lot of time to set up & manage. If you were to create a Value Stream Map of the development and deployment process it would be incredibly complex.

What happens in these scenarios? Completing work can - and usually does - require heroics from your teams at all stages of the development and deployment processes. Fixing problems in the application can require days or weeks of effort. That effort usually results in poor outcomes for your customers. It's not a recipe for success.

An Ideal Scenario: Multi-Minute Deployment Lead Times

What if we strove for fast, constant feedback on our work? What if every dev team did small atomic code changes? Wouldn’t it be nice to have automated testing in the application so the QA team could concentrate on the hard problems? How about regular code deployments into Production on a weekly, daily or hourly basis? What would the Value Stream Map look like in this scenario? Certainly it should be much simpler.

Let's consider the outcomes in this scenario. For starters, shorter lead times enable a high degree of confidence in changes introduced into the application. No longer do your development and ops teams live in fear of breaking existing features or losing customer confidence. Agile development practices and automated testing regimes passively drive development teams to build modular, loosely coupled architectures so small teams can work with a high degree of autonomy. With smaller changes that are tested and pushed regularly, failures that do get through tend to be smaller and more contained.

Initial Metrics To Begin With

What are some metrics to start measuring to determine where pain points are prior to starting or monitoring your DevOps journey? A few suggestions...

Lead Times - As previously discussed, the total time from when a request is made to when it is deployed.
Process Times - Limited to the time it takes to complete the development of the feature or fix.
Percent Complete and Accurate - For a given Value Stream Map step (ie. coding/unit testing, UAT testing, etc.) this is the percent of Complete/Accurate work items received from the previous step in the Value Stream Map.

The 3 Ways: The Principles Underpinning DevOps

In the book The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win, the authors mention “The 3 Ways” as principles that they have found involved in all successful DevOps practicing organizations. I will briefly cover them and we can go into more detail in future articles.

1st Way: Flow of Work

We should first strive to have fast flow of work, per our Value Stream Map, from development to operations to your customers.

2nd Way: Feedback

What good is running really fast in a direction if you don’t provide fast and constant feedback from the right to the left at all stages of our Value Stream Map?

3rd Way: Continuous Learning & Experimentation

Flow and feedback at all stages of the Value Stream Map can then allow you to create a culture of continuous learning and experimentation. This allows your teams to innovate, share their learnings, experiment on new ideas without fear of retribution, and contribute to moving the organization forward on all levels.

The First Way of DevOps: Flow

In the previous article, we started talking about DevOps principles and stopped after a brief introduction to “The 3 Ways” as mentioned in the book The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. We’ll be continuing on by talking about the first way, Flow.

What Do We Mean By Flow?

When we look at the Value Stream Map, we should consider the performance of the entire system we’re in, not just a specific department or team within our organization. Focusing on a little silo won’t give us enough perspective on where bottlenecks are or where improvements can be made across the system as a whole.

We should understand the flow of work in our value stream. Work can originate from features, defects, requirements from other teams (InfoSec, Legal, etc), support tickets, etc. Your value stream should always be moving forward. If you identify areas where flow is going backwards in your value stream, that should be treated as a problem that needs addressed.

As The Ford Motor Company says, “Quality if Job Number 1.” All members of the organization that participate in the value stream should focus on improving quality. Most of us would like to only do something once. Sometimes that is not always the case. What is important is making rework visible and accountable. If the individuals causing rework don’t “feel the pain” then they are likely to continue impeding flow and any fixes they contribute are unlikely to correct the cause of the rework.

The key lies in having a deep understanding of your system. In a Systems Thinking approach, this doesn't mean that the smartest person in the room knows everything. High-performing organizations will lean on collective intelligence. You need to have a foundation of knowledge in order to make informed decisions. Deming referred to this as “Appreciation of the System.”

What Are Some Things That Interfere With Flow?

Visibility

Visibility around problems and processes can be an issue. Software development work is often "invisible" because there is no tangible product you can hold in your hands. Where is flow being impeded in your current processes? Do you know if there are any work constraints?

You should strive to make work as visible as possible. I’m not talking about all the developers using giant screens with large fonts. Large visual work boards are one way. An Obeya, or visual room, is another way.

A good place to start is a simple idea board. Have individuals put up ideas to make their own work easier or faster. It may not immediately “move the needle” but the idea here is to get everyone comfortable with making their ideas visible to both peers and leadership in the organization.

Involving individuals in establishing qualitative and quantitative measurements for each process in the Value Stream will help organize a visual board. For example, departments within an organization may decide that they want to practice the 5Ss and want to track how well the department is doing.

Controlling the signal to noise ratio of your visual communication will go a long way in being useful to the organization.

Beware, though... an organization can get carried away with metrics and measurements for their visual boards & rooms. Controlling the signal to noise ratio of your visual communication will go a long way in being useful to the organization. Identifying critical success factors that help the organization reach yearly goals allows the collective to select the process measurements that can be visualized to radiate progress towards those goals.

Dynamic WIP

WIP, or Work In Progress, varies from team to team and organization to organization. In the case of software development organizations, interrupting developers is easy. The consequences of the interruptions are generally invisible. Aso, something as simple as being on multiple projects at a time actually decreases efficiency and raises risk.

One way to help with this problem is to limit the WIP for project teams. A simple visual way of doing this is with a Kanban Board. The project team can define, agree on, and enforce WIP limits on each column of the board. This makes it easier for everyone on the project team, and others that may just be passing through the project team space, to see problems that prevent completion of work.

Long Development Cycles

Many organizations tend towards long development cycles. This can be anywhere from 6-24 months of development before user feedback is solicited. Some would argue that 2 months is too long for this.

One of the causes of long development cycles, especially within very large organizations, is BRUF and Analysis Paralysis. Then, once you get past that, the developers work first, then testing comes later. The testers test everything - regression, integration, edge cases. Then, some day, the code sees the light of day and gets into a Production environment that the customer is allowed to access. This is where the organization figures out if they even built the right thing or not.

Clearly, this is not a sustainable approach, but it is a pretty common one. How do we change this approach in order to increase our flow? One way is to look at implementing a GRITapproach. Rather than documenting All The Things at the beginning, we can (and should) treat requirements gathering as an iterative process slightly ahead of the development iterations.

Instead of relying on a QA team or department to do all of the testing, the developers can adopt the practices of TDD and ATDD as part of their workflow. This would then allow for automated unit & integration tests which would free up dedicated QA to focus solely on edge cases and the “hard” problems that can only be found via manual testing as the “easy” stuff is now found prior to anything reaching QA.

Restructuring how you handle requirements gathering and managing code testing are just two ways to cut down on the development cycle time. This will allow the organization to deploy to Production sooner.

Getting Code Into Higher Environments

How long does it typically take for a development team to push their code to Production? Are multiple departments involved? There could be hundreds or thousands of operations between the code entering into source control and the final deployment into Production. Depending on the number of individuals involved, there is N(N-1) / 2 possible communication paths in the project. If you think about it, if multiple teams are involved in the deployment of software to Production, it can be kind of like the game of Telephone.

How can we mitigate these kinds of problems? If we can reduce the number of handoffs, that would increase our flow. Automating portions of the work (code build, code deployment, server configuration, etc) can certainly help. Also helpful: reorganizing project teams so that they can be cross-functional to directly deliver value themselves versus having many external teams that are relied upon.

Other Project Barriers

Partially done work, extra unintended features, task switching due to changing priorities, defects found later in the development cycle, and team heroics can all contribute to impeding flow.

So, what can we do? Being introspective and looking to eliminate waste in our value stream can help. Partially done work loses value over time. The idea of “gold plating” generally adds complexity & testing effort and thus should be actively thwarted. Using TDD/BDD/ATDD techniques to shorten the time between defect creation & detection can diminish the difficulty of resolution. Lastly, don’t put your teams into a position where unreasonable acts become the norm.

发表评论