Notes from DEVOPS 2020 Online conference

DevOps 2020 Online was held 21.4. and 22.4.2020 and the first day talked about Cloud & Transformation and the second was 5G DevOps Seminar. Here are some quick notes from the talks I found the most interesting. The talk recordings are available from the conference site.

DevOps 2020

How to improve your DevOps capability in 2020

Marko Klemetti from Eficode presented three actions you can take to improve your DevOps capabilities. It looked at current DevOps trends against organizations on different maturity levels and gave ideas how you can improve tooling, culture and processes.

  1. Build the production pipeline around your business targets.
    • Automation build bridges until you have self-organized teams.
    • Adopt a DevOps platform. Aim for self-service.
  2. Invest in a Design System and testing in natural language:
    • brings people in organization together.
    • Testing is the common language between stakeholders.
    • You can have discussion over the test cases: automated quality assurance from stakeholders.
  3. Validate business hypothesis in production:
    • Enable canary releasing to lower the deployment barrier.
    • You cannot improve what you don't see. Make your pipeline data-driven.

The best practices from elite performers are available for all maturity levels: DevOps for executives.

Practical DevSecOps Using Security Instrumentation

Jeff Williams from Contrast Security talked about how we need a new approach to security that doesn't slow development or hamper innovation. He shows how you can ensure software security from the "inside out" by leveraging the power of software instrumentation. It establishes a safe and powerful way for development, security, and operations teams to collaborate.

DevSecOps is about changing security, not DevOps
What is security instrumentation?
  1. Security testing with instrumentation:
    • Add matchers to catch potentially vulnerable code and report rule violations when it happens, like using unparameterized SQL. Similar what static code analysis does.
  2. Making security observable with instrumentation:
    • Check for e.g. access control for methods
  3. Preventing exploits with instrumentation:
    • Check that command isn't run outside of scope

The examples were written with Java but the security checks should be implementable also on other platforms.

Modern security (inside - out)

Their AppSec platform's Community Edition is free to try out but only for Java and .Net.

Open Culture: The key to unlocking DevOps success

Chris Baynham-Hughes from RedHat talked how blockers for DevOps in most organisations are people and process based rather than a lack of tooling. Addressing issues relating to culture and practice are key to breaking down organisational silos, shortening feedback loops and reducing the time to market.

Start with why
DevOps culture & Practice Enablement: openpracticelibrary.com

Three layers required for effective transformation:

  1. Technology
  2. Process
  3. People and culture
Open source culture powers innovation.

Scaling DevSecOps to integrate security tooling for 100+ deployments per day

Rasmus Selsmark from Unity talked how Unity integrates security tooling better into the deployment process. Best practices for securing your deployments involve running security scanning tools as early as possible during your CI/CD pipeline, not as an isolated step after service has been deployed to production. The session covered best security practices for securing build and deployment pipeline with examples and tooling.

  • Standardized CI/CD pipeline, used to deploy 200+ microservices to Kubernetes.
Shared CI/CD pipeline enables DevSecOps
Kubernetes security best practices
DevSecOps workflow: Early feedback to devs <-----> Collect metrics for security team
  • Dev:
    • Keep dependencies updated: Renovate.
    • No secrets in code: unity-secretfinder.
  • Static analysis
    • Sonarqube: Identify quality issues in code.
    • SourceClear: Information about vulnerable libraries and license issues.
    • trivy: Vulnerability Scanner for Containers.
    • Make CI feedback actionable for teams, like generating notifications directly in PRs.
  • When to trigger deployment
    • PR with at least one approver.
    • No direct pushes to master branch.
    • Only CI/CD pipeline has staging and production deployment access.
  • Deployment
    • Secrets management using Vault. Secrets separate from codebase, write-only for devs, only vault-fetcher can read. Values replaced during container startup, no environment variables passed outside to container.
  • Production
    • Container runtime security with Falco: identify security issues in containers running in production.
Standarized CI/CD pipeline allows to introduce security features across teams and microservices

Data-driven DevOps: The Key to Improving Speed & Scale

Kohsuke Kawaguchi, Creator of Jenkins, from Launchable talked how some organizations are more successful with DevOps than others and where those differences seem to be made. One is around data (insight) and another is around how they leverage "economy of scale".

Cost/time trade-off:

  • CFO: why do we spend so much on AWS?
    • Visibility into cost at project level
    • Make developers aware of the trade-off they are making: Build time vs. Annual cost
      • Small: 15 mins / $1000; medium: 10 mins / $2000; large: 8 mins / $3000
  • Whose problem is it?
    • A build failed: Who should be notified first?
      • Regular expression pattern matching
      • Bayesian filter

Improving software delivery process isn't get prioritized:

  • Data (& story) helps your boss see the problem you see
  • Data helps you apply effort to the right place
  • Data helps you show the impact of your work

Cut the cost & time of the software delivery process

  1. Dependency analysis
  2. Predictive test selection
    • You wait 1 hour for CI to clear your pull request?
    • Your integration tests only run nightly?
Predictive test selection
  • Reordering tests: Reducing time to first failure (TTFF)
  • Creating an adaptive run: Run a subset of your tests?

Deployment risk prediction: Can we flag risky deployments beforehand?

  • Learn from previous deployments to train the model

Conclusions

  • Automation is table stake
  • Using data from automation to drive progress isn't
    • Lots of low hanging fruits there
  • Unicorns are using "big data" effectively
    • How can the rest of us get there?

Moving 100,000 engineers to DevOps on the public cloud

Sam Guckenheimer from Microsoft talked how Microsoft transformed to using Azure DevOps and GitHub with a globally distributed 24x7x365 service on the public cloud. The session covered organizational and engineering practices in five areas.

Customer Obsession

  • Connect our customers directly and measure:
    • Direct feedback in product, visible on public site, and captured in backlog
  • Develop personal Connection and cadence
    • For top customers, have a "Champ" which maintain: Regular personal contact, long-term relationship, understanding customer desires
  • Definition of done: live in production, collecting telemetry that examines the hypothesis which motivated the deployment
Ship to learn

You Build It, You Love It

  • Live site incidents
    • Communicate externally and internally
    • Gather data for repair items & mitigate for customers
    • Record every action
    • Use repair items to prevent recurrence
  • Be transparent

Align outcomes, not outputs

  • You get what you measure (don't measure what you don't want)
    • Customer usage: acquisition, retention, engagement, etc.
    • Pipeline throughput: time to build, test, deploy, improve, failed and flaky automation, etc.
    • Service reliability: time to detect, communicate, mitigate; which customers affected, SLA per customer, etc.
    • "Don't" measure: original estimate, completed hours, lines of code, burndown, velocity, code coverage, bugs found, etc.
  • Good metrics are leading indicators
    • Trailing indicators: revenue, work accomplished, bugs found
    • Leading indicators: change in monthly growth rate of adoption, change in performance, change in time to learn, change in frequency of incidents
  • Measure outcomes not outputs

Get clean, stay clean

  • Progress follows a J-curve
    • Getting clean is highly manual
    • Staying clean requires dependable automation
  • Stay clean
    • Make technical debt visible on every team's dashboard

Your aim won't be perfect: Control the impact radius

  • Progressive exposure
    • Deploy one ring at a time: canary, data centers with small user counts, highest latency, th rest.
    • Feature flags control the access to new work: setting is per user within organization

Shift quality left and right

  • Pull requests control code merge to master
  • Pre-production test check every CI build

Automate versioning and changelog with release-it on GitLab CI/CD

It’s said that you should automate all the things and one of the things could be versioning your software. Incrementing the version number in your e.g. package.json is easy but it’s easier when you bundle it to your continuous integration and continuous deployment process. There are different tools you can use to achieve your needs and in this article we are using release-it. Other options are for example standard-version and semantic-release.

🚀 Automate versioning and package publishing

Using release-it with CI/CD pipeline

Release It is a generic CLI tool to automate versioning and package publishing related tasks. It’s installation requires npm but package.json is not needed. With it you can i.a. bump version (in e.g. package.json), create git commit, tag and push, create release at GitHub or GitLab, generate changelog and make a release from any CI/CD environment.

Here is an example setup how to use release-it on Node.js project with Gitlab CI/CD.

Install and configure release-it

Install release-it with npm init release-it which ask you questions or manually with npm install --save-dev release-it .

For example the package.json can look the following where commit message has been customized to have "v" before version number and npm publish is disabled (although private: true should be enough for that). You could add [skip ci] to "commitMessage" for i.a. GitLab CI/CD to skip running pipeline on release commit or use Git Push option ci.skip.

package.json
{
  "name": “example-frontend",
  "version": "0.1.2",
  "private": true,
  "scripts": {
    ...
    "release": "release-it"
  },
  "dependencies": {
    …
  },
  "devDependencies": {
    ...
    "release-it": "^12.4.3”
},
"release-it": {
    "git": {
      "tagName": "v${version}",
      "requireCleanWorkingDir": false,
      "requireUpstream": false,
      "commitMessage": "Release v%s"
    },
    "npm": {
      "publish": false
    }
  }
}

Now you can run npm run release from the command line:

npm run release
npm run release -- patch --ci

In the latter command things are run without prompts (--ci) and patch increases the 0.0.x number.

Using release-it with GitLab CI/CD

Now it’s time to combine release-it with GitLab CI/CD. Adding release-it stage is quite straigthforward but you need to do couple of things. First in order to push the release commit and tag back to the remote, we need the CI/CD environment to be authenticated with the original host and we use SSH and public key for that. You could also use private token with HTTPS.

  1. Create SSH keys as we are using the Docker executorssh-keygen -t ed25519.
  2. Create a new SSH_PRIVATE_KEY variable in "project > repository > CI / CD Settings" where and paste the content of your private key that you created to the Value field.
  3. In your "project > repository > Repository" add new deploy key where Title is something describing and Key is the content of your public key that you created.
  4. Tap "Write access allowed".

Now you’re ready for git activity for your repository in CI/CD pipeline. Your .gitlab-ci.yaml release stage could look following.

image: docker:19.03.1

stages:
  - release

Release:
  stage: release
  image: node:12-alpine
  only:
    - master
  before_script:
    - apk add --update openssh-client git
    # Using Deploy keys and ssh for pushing to git
    # Run ssh-agent (inside the build environment)
    - eval $(ssh-agent -s)
    # Add the SSH key stored in SSH_PRIVATE_KEY variable to the agent store
    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
    # Create the SSH directory and give it the right permissions
    - mkdir -p ~/.ssh
    - chmod 700 ~/.ssh
    # Don't verify Host key
    - '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'
    - git config user.email "gitlab-runner@your-domain.com"
    - git config user.name "Gitlab Runner"
  script:
    # See https://gist.github.com/serdroid/7bd7e171681aa17109e3f350abe97817
    # Set remote push URL
    # We need to extract the ssh/git URL as the runner uses a tokenized URL
    # Replace start of the string up to '@'  with git@' and append a ':' before first '/'
    - export CI_PUSH_REPO=$(echo "$CI_REPOSITORY_URL" | sed -e "s|.*@\(.*\)|git@\1|" -e "s|/|:/|" )
    - git remote set-url --push origin "ssh://${CI_PUSH_REPO}"
    # runner runs on a detached HEAD, checkout current branch for editing
    - git reset --hard
    - git clean -fd
    - git checkout $CI_COMMIT_REF_NAME
    - git pull origin $CI_COMMIT_REF_NAME
    # Run release-it to bump version and tag
    - npm ci
    - npm run release -- patch --ci --verbose

We are running release-it here with patch increment. If you want to skip CI pipeline on release-it commit you can either use the ci.skip Git Push option package.json git.pushArgs which tells GitLab CI/CD to not create a CI pipeline for the latest push. This way we don't need to add [skip ci] to commit message.

And now you're ready to run the pipeline with release stage and enjoy of automated patch updates to your application's version number. And you also get GitLab Releases if you want.

Setting up the script step was not so clear but fortunately people in the Internet had done it earlier and Google found a working gist and comment on GitLab issue. Interacting with git in the GitLab CI/CD could be easier and there are some feature requests for that like allowing runners to push via their CI token.

Customizing when pipelines are run

There are some more options for GitLab CI/CD pipelines if you want to run pipelines after you've tagged your version. Here's snippet of running "release" stage on commits to master branch and skipping it if commit message is for release.

Release:
  stage: release
  image: node:12-alpine
  only:
    refs:
      - master
    variables:
      # Run only on master and commit message doesn't start with "Release v"
      - $CI_COMMIT_MESSAGE !~ /^Release v.*/
  before_script:
    ...
  script:
    ...

Now we can build a new container for the deployment of our application after it has been tagged and version bumped. Also we are reading the package.json version for tagging the image.

variables:
  PACKAGE_VERSION: $(cat package.json | grep version | head -1 | awk -F= "{ print $2 }" | sed 's/[version:,\",]//g' | tr -d '[[:space:]]')

Build dev:
  before_script:
    - export VERSION=`eval $PACKAGE_VERSION`
  stage: build
  script:
    - >
      docker build
      --pull
      --tag your-docker-image:latest
      --tag your-docker-image:$VERSION.dev
      .
    - docker push your-docker-image:latest
    - docker push your-docker-image:$VERSION.dev
  only:
    refs:
      - master
    variables:
      # Run only on master and commit message starts with "Release v"
      - $CI_COMMIT_MESSAGE =~ /^Release v.*/

Using release-it on detached HEAD

In the previous example we made a checkout to current branch for editing as the runner runs on detached HEAD. You can use the detached HEAD as shown below but the downside is that you can't create GitLab Releases from the pipeline as it fails to "ERROR Response code 422 (Unprocessable Entity)". This is because (I suppose) it doesn't make git push as it's done in manually with git.

Then the .gitlab-ci.yml is following:

...
script:
    - export CI_PUSH_REPO=$(echo "$CI_REPOSITORY_URL" | sed -e "s|.*@\(.*\)|git@\1|" -e "s|/|:/|" )
    - git remote set-url --push origin "ssh://${CI_PUSH_REPO}"
    # gitlab-runner runs on a detached HEAD, create a temporary local branch for editing
    - git checkout -b ci
    # Run release-it to bump version and tag
    - npm ci
    - DEBUG=release-it:* npm run release -- patch --ci --verbose --no-git.push
    # Push changes to originating branch
    # Always return true so that the build does not fail if there are no changes
    - git push --follow-tags origin ci_processing:${CI_COMMIT_REF_NAME} || true

Reset Hasura migrations and squash files

Using GraphQL for creating REST APIs is nowadays popular and there are different tools you can use. One of them is Hasura which is an open-source engine that gives you realtime GraphQL APIs on new or existing Postgres databases. Hasura is quite easy to work with but if your GraphQL schemas change a lot it creates plentiful of migration files. This has some unwanted consequences (for example slowing down the hasura migrate apply or even blocking it). Here’s some notes how to reset the state and create new migrations from the state that is on the server.

Note: From Hasura 1.0.0 onwards squashing is easier with hasura migrate squash command. It's still in preview. But before Hasura 1.0.0 version you have to squash migrations manually and this blog post explains how. The results are the same: squashing multiple migrations into a single one.

Hasura documentation provides a good guide how to squash migrations but in practice there are couple of other things you may need to address. So let’s combine the steps Hasura gives and some extra steps.

Reset Hasura migrations

First make a backup branch:

  1. $ git checkout master
  2. Create a backup branch:
    $ git checkout -b backup/migrations-before-resetting-20XX-XX-XX
  3. Update the backup branch to origin:
    $ git push origin backup/migrations-before-resetting-20XX-XX-XX

We are assuming you've local Hasura running on Docker with something like the following docker-compose.yml

version: "3.6"
services:
  postgres:
    image: postgres:11-alpine
    restart: always
    ports:
      - "5432:5432"
    volumes:
      - db_data:/var/lib/postgresql/data
    command: postgres -c max_locks_per_transaction=2000
  graphql-engine:
    image: hasura/graphql-engine:v1.0.0-beta.6
    ports:
      - "8080:8080"
    depends_on:
      - "postgres"
    restart: always
    environment:
      HASURA_GRAPHQL_DATABASE_URL: postgres://postgres:@postgres:5432/postgres
      HASURA_GRAPHQL_ENABLE_CONSOLE: "true" # set to "false" to disable console
      HASURA_GRAPHQL_ADMIN_SECRET: changeme
      HASURA_GRAPHQL_ENABLED_LOG_TYPES: startup, http-log, webhook-log, websocket-log, query-log
volumes:
  db_data:

Create local instance of Hasura with up to date migrations:

  1. $ docker-compose down -v
  2. $ docker-compose up
  3. $ hasura migrate apply --endpoint=http://localhost:8080 --admin-secret=changeme

Reset migrations to master:

  1. git checkout master
  2. git checkout -b reset-hasura-migrations
  3. rm -rf migrations/*

Reset the migration history on server. On hasura SQL console, http://localhost:8080/console:

TRUNCATE hdb_catalog.schema_migrations;

Setup fresh migrations by taking the schema and metadata from the server. By default init only takes public schema if others not mentioned with the --schema "your schema" parameter. Note down the version for later use.

  1. Create migration file:
    $ hasura migrate create "init" --from-server
  2. Mark the migration as applied on this server:
    $ hasura migrate apply --version "" --skip-execution
  3. Verify status of migrations, should show only one migration with Present status:
    $ hasura migrate status
  4. You have brand new migrations now!

Resetting migrations on other environments

  1. Checkout the reset branch on local machine:
    $ git checkout -b reset-hasura-migrations
  2. Reset the migration history on remote server. On Hasura SQL console:
    TRUNCATE hdb_catalog.schema_migrations;
  3. Apply migration status to remote server:
    $ hasura migrate apply --version "<version>" --skip-execution

Local environment Hasura status

For other developers please refer these instructions in order to get the backend into same state.

Option 1: Keep old data

  1. Checkout the backup branch on local machine:
    $ git checkout backup/migrations-before-resetting-20XX-XX-XX
  2. Reset the migration history on local server. On Hasura SQL console:
    TRUNCATE hdb_catalog.schema_migrations;
  3. Apply migration status to local server:
    $ hasura migrate apply --version "<version>" --skip-execution

Option 2: Remove all and start from beginning

  1. Clean up the old docker volumes:
    $ docker-compose down -v
  2. Start up services:
    $ docker-compose up
  3. Checkout master:
    $ git checkout master
  4. Apply migrations:
    $ hasura migrate apply --endpoint=http://localhost:8080 --admin-secret=changeme

Possible extra steps

Now your Hasura migrations and database tables are in one migration init file but sometimes things don’t work out when applying it to empty database. We are using Hasura audit-trigger and had to reorder the SQL clauses done by the migrate init and add some missing parts.

  1. Move schema creations after audit clauses
  2. Move audit.audit_table(target_table regclass) to last audit clause and copy it from audit.sql
  3. Add pg_trgm extension as done previously (fixes "operator does not exist: text <%!t(MISSING)ext" in public.search_customers_by_name)
  4. Drop session constraints / index before creating new
  5. Create session table only if not exists

Git pre-commit and pre-receive hooks: validating YAML

Software development has many steps which you can automate and one useful thing to automate is to add Git commit hooks to validate your commits to version control. Firing off custom client-side and server-side scripts when certain important actions occur. Validating commited files' contents is important for syntax validity and even more when providing Spring Cloud Config configurations in YAML for microservices as otherwise things fail.

Validating YAML can be done by using a yamllint and hooking it to pre-commit or pre-receive. It does not only check for syntax validity, but for weirdnesses like key repetition and cosmetic problems such as lines length, trailing spaces and indentation. Here's a short overview to get started with yamllint on Git commit hooks.

Quickstart for yamllint

Installing yamllint

On Fedora / CentOS:
$ sudo dnf install yamllint
 
using pip, the Python package manager:
$ sudo pip install yamllint
 
or as in macOS
$ sudo -H python -m pip install yamllint

You can also install yamllint from sources when e.g. network connectivity is limited. The linter depends on pathspec >=0.5.3 and pyyaml >= 3.12.

Custom config

Yamllint is quite strict with validation and you might want to make it a bit more relax with custom configuration. For example I need to allow long lines. You can also disable checks for a specific line with a comment.

$ cat yamllint-config.yml
 
extends: default
 
rules:
  line-length: disable
  comments:
    require-starting-space: false

Usage

$ yamllint file.yml other-file.yaml

Usage with custom config:

$ yamllint -c yamllint-config.yml .

Or with custom config without config file:

$ yamllint -d "{extends: relaxed, rules: {line-length: {max: 120}}}" file.yaml

Or more specific case like running yamllint in Jenkins job's workspace and validating files with specific suffix:

$ find . -type f -iname '*.j2' -exec yamllint -s -c yamllint-config.yaml {} \;

Pre-commit hook and yamllint

Better way to use yamllint is to integrate it with e.g. git and pre-commit-hook or pre-receive-hook. Adding yamllint to pre-commit-hook is easy with pre-commit which is a framework for managing and maintaining multi-language pre-commit hooks.

Installing pre-commit:

Using pip:
$ pip install pre-commit
 
Or on macOS:
$ brew install pre-commit

To enable yamllint pre-commit plugin you just add a file called .pre-commit-config.yaml to the root of your project and add following snippet to it

$ cat .pre-commit-config.yaml
---
- repo: https://github.com/adrienverge/yamllint.git
  sha: v1.10.0
  hooks:
    - id: yamllint

With custom config and strict mode:

$ cat .pre-commit-config.yaml
---
repos:
 - repo: https://github.com/adrienverge/yamllint.git
   sha: v1.10.0
   hooks:
     - id: yamllint
       args: ['-d {extends: relaxed, rules: {line-length: disable}}', '-s']

You can also use repository-local hooks when e.g. it makes sense to distribute the hook scripts with the repository. Install yamllint locally and configure yamllint to your project's root directory's .pre-commit-config.yaml as repository local hook. As you can see, I'm using custom config for yamllint.

$ cat .pre-commit-config.yaml
---
- repo: local
  hooks: 
  - id: yamllint
    name: yamllint
    entry: yamllint -c yamllint-config.yml .
    language: python
    types: [file, yaml]

Note: If you're linting files with other suffix than yaml/yml like ansible template files with .j2 suffix then use types: [file]

Pre-receive hook and yamllint

Using pre-commit-hooks to process commits is easy but often doing checks in server side with pre-receive hooks is better. Pre-receive hooks are useful for satisfying business rules, enforce regulatory compliance, and prevent certain common mistakes. Common use cases are to require commit messages to follow a specific pattern or format, lock a branch or repository by rejecting all pushes, prevent sensitive data from being added to the repository by blocking keywords, patterns or filetypes and prevent a PR author from merging their own changes.

One example of pre-receive hooks is to run a linter like yamllint to ensure that business critical file is valid. In practice the hook works similarly as pre-commit hook but files you check in to repository are not kept there "just like that". Some of them are stored as deltas to others, or their the contents are compressed. There is no place where these files are guaranteed to exist in their "ready-to-consume" state. So you must take some extra hoops to get your files available for opening them and running checks.

There are different approaches to make files available for pre-receive hook's script as StackOverflow describes. One way is to check out the files in a temporary location or if you're on linux you can just point /dev/stdin as input file and put the files through pipe. Both ways have the same principle: checking modified files between new and the old revision and if files are present in new revision, runs the validation script with custom config.

Using /dev/stdin trick in Linux:

#!/usr/bin/env bash
 
set -e
 
ENV_PYTHON='/usr/bin/python'
 
if ((
       (ENV_PYTHON_RETV != 0) && (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi
 
oldrev=$1
newrev=$2
refname=$3
 
while read oldrev newrev refname; do
    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`
 
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    git diff --name-only $oldrev $newrev | while read file; do
        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | egrep '\.yml$' | awk '{ print $3 }'`
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi
 
        # Get file in commit and point /dev/stdin as input file 
        # and put the files through pipe for syntax validation
        echo $file
        git show $newrev:$file | /usr/bin/yamllint -d "{extends: relaxed, rules: {line-length: disable, comments: disable, trailing-spaces: disable, empty-lines: disable}}" /dev/stdin || exit 1
    done
done

Alternative way: copy changed files to temporary location

#!/usr/bin/env bash
 
set -e
 
EXIT_CODE=0
ENV_PYTHON='/usr/bin/python'
COMMAND='/usr/bin/yamllint'
TEMPDIR=`mktemp -d`
 
if ((
        (ENV_PYTHON_RETV != 0) &&
        (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi
 
oldrev=$1
newrev=$2
refname=$3
 
while read oldrev newrev refname; do
 
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    files=`git diff --name-only ${oldrev} ${newrev}`
 
    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`
 
    # Iterate over each of these files
    for file in ${files}; do
 
        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | awk '{ print $3 }'`
 
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi
 
        # Otherwise, create all the necessary sub directories in the new temp directory
        mkdir -p "${TEMPDIR}/`dirname ${file}`" &>/dev/null
        # and output the object content into it's original file name
        git cat-file blob ${object} > ${TEMPDIR}/${file}
 
    done;
done
 
# Now loop over each file in the temp dir to parse them for valid syntax
files_found=`find ${TEMPDIR} -name '*.yml'`
for fname in ${files_found}; do
    ${COMMAND} ${fname}
    if [[ $? -ne 0 ]];
    then
      echo "ERROR: parser failed on ${fname}"
      BAD_FILE=1
    fi
done;
 
rm -rf ${TEMPDIR} &> /dev/null
 
if [[ $BAD_FILE -eq 1 ]]
then
  exit 1
fi
 
exit 0

Testing pre-receive hook locally is a bit more difficult than pre-commit-hook as you need to get the environment where you have to remote repository. Fortunately you can use the process which is described for GitHub Enterprise pre-receive hooks. You create a local Docker environment to act as a remote repository that can execute the pre-receive hook.

Dockerizing all the things: Running Ansible inside Docker container

Automating things in software development is more than useful and using Ansible is one way to automate software provisioning, configuration management, and application deployment. Normally you would install Ansible to your control node just like any other application but an alternate strategy is to deploy Ansible inside a standalone Docker image. But why would you do that? This approach has benefits to i.a. operational processes.

Although Ansible does not require installation of any agents within managed nodes, the environment where Ansible is installed is not so simple to setup. In control node it requires specific Python libraries and their system dependencies. So instead of using package manager to install Ansible and it's dependencies we just pull a Docker image.

By creating an Ansible Docker image you get the Ansible version you want and isolate all of the required dependencies from the host machine which potentially might break things in other areas. And to keep things small and clean your image uses Alpine Linux.

The Dockerfile is:

FROM alpine:3.7
 
ENV ANSIBLE_VERSION 2.5.0
 
ENV BUILD_PACKAGES \
  bash \
  curl \
  tar \
  openssh-client \
  sshpass \
  git \
  python \
  py-boto \
  py-dateutil \
  py-httplib2 \
  py-jinja2 \
  py-paramiko \
  py-pip \
  py-yaml \
  ca-certificates
 
# If installing ansible@testing
#RUN \
#	echo "@testing http://nl.alpinelinux.org/alpine/edge/testing" >> #/etc/apk/repositories
 
RUN set -x && \
    \
    echo "==> Adding build-dependencies..."  && \
    apk --update add --virtual build-dependencies \
      gcc \
      musl-dev \
      libffi-dev \
      openssl-dev \
      python-dev && \
    \
    echo "==> Upgrading apk and system..."  && \
    apk update && apk upgrade && \
    \
    echo "==> Adding Python runtime..."  && \
    apk add --no-cache ${BUILD_PACKAGES} && \
    pip install --upgrade pip && \
    pip install python-keyczar docker-py && \
    \
    echo "==> Installing Ansible..."  && \
    pip install ansible==${ANSIBLE_VERSION} && \
    \
    echo "==> Cleaning up..."  && \
    apk del build-dependencies && \
    rm -rf /var/cache/apk/* && \
    \
    echo "==> Adding hosts for convenience..."  && \
    mkdir -p /etc/ansible /ansible && \
    echo "[local]" >> /etc/ansible/hosts && \
    echo "localhost" >> /etc/ansible/hosts
 
ENV ANSIBLE_GATHERING smart
ENV ANSIBLE_HOST_KEY_CHECKING false
ENV ANSIBLE_RETRY_FILES_ENABLED false
ENV ANSIBLE_ROLES_PATH /ansible/playbooks/roles
ENV ANSIBLE_SSH_PIPELINING True
ENV PYTHONPATH /ansible/lib
ENV PATH /ansible/bin:$PATH
ENV ANSIBLE_LIBRARY /ansible/library
 
WORKDIR /ansible/playbooks
 
ENTRYPOINT ["ansible-playbook"]

The Dockerfile declares an entrypoint enabling the running container to function as a self-contained executable, working as a proxy to the ansible-playbook command.

Build the image as:

docker build -t walokra/ansible-playbook .

You can test the ansible-playbook running inside the container, e.g.:

docker run --rm -it -v $(pwd):/ansible/playbooks \
    walokra/ansible-playbook --version

The command for running e.g. site.yml playbook with ansible-playbook from inside the container:

docker run --rm -it -v $(pwd):/ansible/playbooks \
    walokra/ansible-playbook site.yml

If Ansible is interacting with external machines, you'll need to mount an SSH key pair for the duration of the play:

docker run --rm -it \
    -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
    -v ~/.ssh/id_rsa.pub:/root/.ssh/id_rsa.pub \
    -v $(pwd):/ansible/playbooks \
    walokra/ansible-playbook site.yml

To make things easier you can use shell script named ansible_helper that wraps a Docker image containing Ansible:

#!/usr/bin/env bash
docker run --rm -it \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
  -v ~/.ssh/id_rsa.pub:/root/.ssh/id_rsa.pub \
  -v $(pwd):/ansible_playbooks \
  -v /var/log/ansible/ansible.log \
  walokra/ansible-playbook "$@"

Point the above script to any inventory file so that you can execute any Ansible command on any host, e.g.

./ansible_helper play playbooks/deploy.yml -i inventory/dev -e 'some_var=some_value'

Now we have dockerized Ansible, isolated it's dependencies and are not restricted to some old version which we get from Linux distribution's package manager. Crafty, isn't it? Check the docker-ansible-playbook repository for more information and examples with Ansible Vault.

This blog post and Dockerfile borrows from Misiowiec's post Running Ansible Inside Docker and his earlier work. If you want to test playbooks it's work checking out his ansible_playbook repository. Since then Alpine Linux has evolved and things could be cleaned a bit more like getting Ansible directly from testing repository.

Nebula Tech Thursday - Beer & DevOps 2.3.2017

Agile software development to the cloud can be nowadays seen more as a rule than exception and that's also what this year's first Nebula Tech Thursday's topics were about. The event was held 2.3.2017 at Woolshed Bar & Kitchen alongside good food and beer.

The event consisted of talks about "Building a Full Devops Pipeline with Open Source Tools" by Oleg Mironov from Eficode and "Cloud Analytics - Providing Insight on Application Health and Performance" by Markus Vuorinen & Jarkko Stråhle from Nebula. The presentations were a bit high level and directed more to the business level people than developers but there was some new information how different tools were used in practice.

Overall it was nice event to hear how things can be done and to talk with people. Here's my short notes from the event.

Nebula Tech Day

Cloud Analytics - Providing Insight on Application Health and Performance

Markus Vuorinen & Jarkko Stråhle from Nebula talked about how to gather data to Elasticsearch, make it accessible and visualize it with Kibana and make actions based on that. The ELK-stack (Elasticsearch - Logstash - Kibana) is commonly used and the presentation showed nicely how to utilize it with cloud.

Technical setup
Technical setup

Data flow to Elastic
Data flow to Elastic

To visualization and alerts
To visualization and alerts

Kibana main view
Kibana main view

Kibana and response times
Kibana and response times

Building a Full Devops Pipeline with Open Source Tools

Oleg Mironov from Eficode showed the building blocks of how to build a Devops pipeline with Open Source Tools and demoed it. Nothing really special if you don't count Rancher and Cattle. Just put your code to Git, use Ansible, run Jenkins jobs, build docker images, use RobotFramework for testing, push artifacts to Artifactory and deploy with Rancher.

Rancher overview
Rancher overview

DevOps Pipeline
DevOps Pipeline

DevOps Finland Meetup goes Mobile at Zalando

Development and operations, DevOps, is in my opinion essential for getting things done with timely manner and it's always good to hear how others are doing it by attending meetups. This time DevOps Finland went Mobile and we heard nice presentations about continuous delivery for mobile applications, mobile testing with Appium and the Robot Framework and efficient mobile development cycle. Compared to developing Web applications mobile brings some extra hurdles to jump but nothing that's not solvable. Here are my short notes about the meetup.

The meetup was hosted by Zalando Technology at their new office here in Helsinki. Zalando is known to many as that online store that sells shoes, clothing and other fashion items but things don't sell themselves and behind the scenes they have lots of technologies to keep things running. For the record I think they said that the meetup had 65 attendees of the 100.

Also if mobile is your thing there's a new Meetup group for mobile developers in Finland which was announced at the meetup. They're also on Twitter and Facebook.

Continuous Delivery for Mobile Applications

Rami Rantala from Zalando talked about "Continuous Delivery for Mobile Applications" and how they're managing releases of their Fleek app which is available for Android and iOS in German markets.

They didn't arrive to the final setup straightforward and it was iterative approach with how Git is used, code merged and releases done. Using Fastlane for all tedious tasks, like generating screenshots, dealing with code signing, and releasing your application made automating things easier. Interesting note was that their build server slaves are ansible managed Mac Minis on Rami's desk. They had solved the problems nicely but testing is still difficult.

DevOps and rollbacks don't work together, you roll forward.

Mobile testing with Appium and the Robot Framework

Mobile testing can be done with different tools and one option is to use Robot Framework just like for Web applications. Elmeri Poikolainen from Eficode demoed how to use Appium and run Robot Framework tests on real device. It has some limitations and I think with native applications it could be better to use native test tools like what Xcode has to offer.

Efficient Mobile Development Cycle

The last and most fast-paced talk was by Jerry Jalava from Qvik about "Efficient Mobile Development Cycle". He talked about different practices and tools in the development cycle and it was nice overview to the complexity of the process from design to done. You can for example run remote preview with 27 devices.

Docker containers and using Alpine Linux for minimal base images

After using Docker for a while, you quickly realize that you spend a lot of time downloading or distributing images. This is not necessarily a bad thing for some but for others that scale their infrastructure are required to store a copy of every image that’s running on each Docker host. One solution to make your images lean is to use Alpine Linux which is a security-oriented, lightweight Linux distribution.

Lately I’ve been working with our Docker images for Java and Node.js microservices and when our stack consist of over twenty services, one thing to consider is how we build our docker images and what distributions to use. Building images upon Debian based distributions like Ubuntu works nicely but it gives packages and services which we don’t need. And that’s why developers are aiming to create the thinnest most usable image possible either by stripping conventional distributions, or using minimal distributions like Alpine Linux.

Choosing your Linux distribution

What’s a good choice of Linux distribution to be used with Docker containers? There was a good discussion in Hacker News about small Docker images, which had good points in the comment section to consider when choosing container operating system.

For some, size is a tiny concern, and far more important concerns are, for example:

  • All the packages in the base system are well maintained and updated with security fixes.
  • It's still maintained a few years from now.
  • It handles all the special corner cases with Docker.

In the end the choice depends on your needs and how you want to run your services. Some like to use the quite large Phusion Ubuntu base image which is modified for Docker-friendliness, whereas others like to keep things simple and minimal with Alpine Linux.

Divide and conquer?

One question to ask yourself is: do you need full operating system? If you dump an OS in a container you are treating it like a lightweight virtual machine and that might be fine in some cases. If you however restrict it to exactly what you need and its runtime dependencies plus absolutely nothing more then suddenly it’s something else entirely – it’s process isolation, or better yet, it’s portable process isolation.

Other thing to think about is if you should combine multiple processes in single container. For example if you care about logging you shouldn’t use a logger daemon or logrotate in a container, but you probably want to store them externally – in a volume or mounted host directory. SSH server in container could be useful for diagnosing problems in production, but if you have to log in to a container running in production – you’re doing something wrong (and there’s docker exec anyways). And for cron, run it in a separate container and give access to the exact things your cronjob needs.

There are a couple of different schools of thought about how to use docker containers: as a way to distribute and run a single process, or as a lighter form of a virtual machine. It depends on what you’re doing with docker and how you manage your containers/applications. It makes sense to combine some services, but on the other hand you should still separate everything. It’s preferred to isolate every single process and explicitly telling it how to communicate with other processes. It’s sane from many perspectives: security, maintainability, flexibility and speed. But again, where you draw the line is almost always a personal, aesthetic choice. In my opinion it could make sense to combine nginx and php-fpm in a single container.

Minimal approach

Lately, there has been some movement towards minimal distributions like Alpine Linux, and it has got a lot of positive attention from the Docker community. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox using a grsecurity/PaX patched Linux kernel and OpenRC as its init system. In its x86_64 ISO flavor, it weighs in at an 82MB and a container requires no more than 8 MB. Alpine provides a wealth of possible packages via its apk package manager. As it uses musl, you may run into some issues with environments expecting glibc-like behaviour (for example Kubernetes or with compiling some npm modules), but for most use cases it should work just fine. And with minimal base images it’s more convenient to divide your processes to many small containers.

Some advantages for using Alpine Linux are:

  • Speed in which the image is downloaded, installed and running on your Docker host
  • Security is improved as the image has a smaller footprint thus making the attack surface also smaller
  • Faster migration between hosts which is especially helpful in high availability and disaster recovery configurations.
  • Your system admin won't complain as much as you will use less disk space

For my purposes, I need to run Spring Boot and Node.js applications on Docker containers, and they were easily switched from Debian based images to Alpine Linux without any changes. There are official Docker images for OpenJDK/OpenJRE on Alpine and Dockerfiles for running Oracle Java on Alpine. Although there isn't an official Node.js image built on Alpine, you can easily make your own Dockerfile or use community provided files. When official Java Docker image is 642 MB, Alpine Linux with OpenJDK 8 is 150 MB and with Oracle JDK 382 MB (can be stripped down to 172 MB). With official Node.js image it's 651 MB (or if using slim 211 MB) and with Alpine Linux that's 36 MB. That’s a quite a reduction in size.

Examples of using minimal container based on Alpine Linux:

For Node.js:

FROM alpine:edge
 
ENV NODE_ALPINE_VERSION=6.2.0-r0
 
RUN apk update && apk upgrade \
    && apk add nodejs="$NODE_ALPINE_VERSION"

For Java applications with OpenJDK:

FROM alpine:edge
ENV LANG C.UTF-8
 
RUN { \
      echo '#!/bin/sh'; \
      echo 'set -e'; \
      echo; \
      echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
   } > /usr/local/bin/docker-java-home \
   && chmod +x /usr/local/bin/docker-java-home
 
ENV JAVA_HOME /usr/lib/jvm/java-1.8-openjdk
ENV PATH $PATH:$JAVA_HOME/bin
ENV JAVA_VERSION 8u92
ENV JAVA_ALPINE_VERSION 8.92.14-r0
 
RUN set -x \
    && apk update && apk upgrade \
    && apk add --no-cache bash \
    && apk add --no-cache \
      openjdk8="$JAVA_ALPINE_VERSION" \
    && [ "$JAVA_HOME" = "$(docker-java-home)" ]

If you want to read more about running services on Alpine Linux, check Atlassian’s Nicola Paolucci's nice article about experiences of running Java apps on Alpine.

Go small or go home?

So, should you use Alpine Linux for running your application on Docker? As also Docker official images are moving to Alpine Linux then it seems to make perfect sense from both a performance and security perspectives to switch to Alpine. And if you don't want to take the leap from Debian or Ubuntu or want support from the downstream vendor you should consider stripping it from unneeded files to make it smaller.

Container orchestration with CoreOS at Devops Finland meetup

Development and Operations, DevOps, is one of the important things when going beyond agile. It's boosting the agile way of working and can be seen as an incremental way to improve our development practices. And what couldn't be a good place to improve than learning at meetups how others are doing things. This time DevOps Finland meetup was about container orchestration with CoreOS and it was held at Oppex's lounge in central Helsinki. The talks gave a nice dive into CoreOS, covering both beginner and seasoned expert points of view. Here's my short notes about the presentations.

CoreOS intro for beginners, by beginners

The first talk was practically an interactive Core OS tutorial by Antti Vähäkotamäki and Frans Ojala. Their 99 slides showed how to get started with CoreOS on Vagrant step by step and what difficulties they experienced. Nothing special.

CoreOS in production, lessons learned

The more interesting talk about CoreOS was "CoreOS in production, lessons learned" by Vlad Bondarenko from Oppex where he told about their software stack and how they're running it. In short, they're running on baremetal with CoreOS Nginx for reverse proxy, Node.js for UI and API and RethinkDB and SolrCloud clusters. Deployment is made with Ansible and makefiles and Ship.it is used for Node.js. Service discovery is DNS based with docker-etcd-registrator component and they've also written their own DNS server. For Node.js config management with etcd they've made etcd-simple-config component. With Docker they use standard images with volumes and inject own data to the container.

CoreOS seemed to work quite well for them with easy cluster management, running multiple versions of 3rd party and own software and having zero downtime updates or rollbacks. But there were some cons also like maturity (bugs) and scripting systemd.

Kontena, CoreOS war stories

The last talk was about CoreOS war stories in Kontena by Jari Kolehmainen. The slides tell the story of how they use CoreOS on Kontena and what are the pain points. In story short it comes to configuration management and issues related to etcd.

For bootstrapping they use CloudInit which is de-facto way to initialize cloud instances and Integrated to CoreOS. The hard parts with etcd are discovery, security (tls certificates), using central services vs. workers and maintenance (you don't do it). Now they run etcd inside a container, bind it only to localhost and overlay network (Weave Net) and master coordinates etcd discovery. With automatic updates they use the best-effort strategy: If etcd is running, locksmith coordinates the reboots; Otherwise just reboot when update is available.

Presentation's summary was that the "OS" part is currently best option for containers and etcd is a must, but a little hard to handle. For the orchestrator they suggest that pick one which hides all the complexities. And automate all the things.

Problems with installing Oracle DB 12c EE, ORA-12547: TNS: lost contact

For development purposes I wanted to install Oracle Database 12c Enterprise Edition to Vagrant box so that I could play with it. It should've gone quite straight forwardly but in my case things got complicated although I had Oracle Linux and the pre-requirements fulfilled. Everything went fine until it was time to run the DBCA and create the database.

The DBCA gave "ORA-12547: TNS: lost contact" error which is quite common. Google gave me couple of resources to debug the issue. Oracle DBA Blog explained common issues which cause ORA-12547 and solutions to fix it.

One of the suggested solutions was to check to ensure that the following two files are not 0 bytes:

ls -lt $ORACLE_HOME/bin/oracle
ls -lt $ORACLE_HOME/rdbms/lib/config.o

And true, my oracle binary was 0 bytes

-rwsr-s--x 1 oracle oinstall 0 Jul  7  2014 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle

To fix the binary you need to relink it and to do that rename the following file:

$ cd $ORACLE_HOME/rdbms/lib
$ mv config.o config.o.bad

Then, shutdown the database and listener and then “relink all”

$ relink all

If just things were that easy. Unfortunately relinking ended on error:

[oracle@oradb12c lib]$ relink all
/u01/app/oracle/product/12.1.0/dbhome_1/bin/relink: line 168: 13794 Segmentation fault      $ORACLE_HOME/perl/bin/perl $ORACLE_HOME/install/modmakedeps.pl $ORACLE_HOME $ORACLE_HOME/inventory/make/makeorder.xml > $CURR_MAKEORDER
writing relink log to: /u01/app/oracle/product/12.1.0/dbhome_1/install/relink.log

After googling some more I found similar problem and solution: Relink the executables by running make install.

cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk install
 
cd $ORACLE_HOME/network/lib
make -f ins_net_server.mk install
</re>
 
If needed you can also relink other executables:
<pre lang="shell">
make -kf ins_sqlplus.mk install (in $ORACLE_HOME/sqlplus/lib)
make -kf ins_reports60w.mk install (on CCMgr server)
make -kf ins_forms60w.install (on Forms/Web server)

But of course it didn't work out of the box and failed to error:

/bin/ld: cannot find -ljavavm12
collect2: error: ld returned 1 exit status
make: *** [/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/lib/oracle] Error 1

The solution is to copy the libjavavm12.a to under $ORACLE_HOME lib as explained:

cp $ORACLE_HOME/javavm/jdk/jdk6/lib/libjavavm12.a $ORACLE_HOME/lib/

Run the make install commands from above again and you should've working oracle binary:

-rwsr-s--x 1 oracle oinstall 323649826 Feb 17 16:27 /u01/app/oracle/product/12.1.0/dbhome_1/bin/oracle

After this I ran the relink again which worked and also the install of the database worked fine.

cd $ORACLE_HOME/bin
relink all

Start the listener:

lsnrctl start LISTENER

Create the database:

dbca -silent -responseFile $ORACLE_BASE/installation/dbca.rsp

The problems I encountered while installing Oracle Database 12c Enterprise Edition to Oracle Linux 7 although in Vagrant and with Ansible were surprising as you would think that on certified platform it should just work. If I would've been using CentOS or Ubuntu it would've been totally different issue.

You can see the Ansible tasks I did to get Oracle DB 12c EE installed on Oracle Linux 7 in my vagrant-experiments GitHub repo.

Oracle DB 12c EE Ansible Tasks
Oracle DB 12c EE Ansible Tasks