Git pre-commit and pre-receive hooks: validating YAML

Software development has many steps which you can automate and one useful thing to automate is to add Git commit hooks to validate your commits to version control. Firing off custom client-side and server-side scripts when certain important actions occur. Validating commited files’ contents is important for syntax validity and even more when providing Spring Cloud Config configurations in YAML for microservices as otherwise things fail.

Validating YAML can be done by using a yamllint and hooking it to pre-commit or pre-receive. It does not only check for syntax validity, but for weirdnesses like key repetition and cosmetic problems such as lines length, trailing spaces and indentation. Here’s a short overview to get started with yamllint on Git commit hooks.

Quickstart for yamllint

Installing yamllint

On Fedora / CentOS:
$ sudo dnf install yamllint
 
using pip, the Python package manager:
$ sudo pip install yamllint
 
or as in macOS
$ sudo -H python -m pip install yamllint

You can also install yamllint from sources when e.g. network connectivity is limited. The linter depends on pathspec >=0.5.3 and pyyaml >= 3.12.

Custom config

Yamllint is quite strict with validation and you might want to make it a bit more relax with custom configuration. For example I need to allow long lines. You can also disable checks for a specific line with a comment.

$ cat yamllint-config.yml
 
extends: default
 
rules:
  line-length: disable
  comments:
    require-starting-space: false

Usage

$ yamllint file.yml other-file.yaml

Usage with custom config:

$ yamllint -c yamllint-config.yml .

Or with custom config without config file:

$ yamllint -d "{extends: relaxed, rules: {line-length: {max: 120}}}" file.yaml

Or more specific case like running yamllint in Jenkins job’s workspace and validating files with specific suffix:

$ find . -type f -iname '*.j2' -exec yamllint -s -c yamllint-config.yaml {} \;

Pre-commit hook and yamllint

Better way to use yamllint is to integrate it with e.g. git and pre-commit-hook or pre-receive-hook. Adding yamllint to pre-commit-hook is easy with pre-commit which is a framework for managing and maintaining multi-language pre-commit hooks.

Installing pre-commit:

Using pip:
$ pip install pre-commit
 
Or on macOS:
$ brew install pre-commit

To enable yamllint pre-commit plugin you just add a file called .pre-commit-config.yaml to the root of your project and add following snippet to it

$ cat .pre-commit-config.yaml
---
- repo: https://github.com/adrienverge/yamllint.git
  sha: v1.10.0
  hooks:
    - id: yamllint

With custom config and strict mode:

$ cat .pre-commit-config.yaml
---
repos:
 - repo: https://github.com/adrienverge/yamllint.git
   sha: v1.10.0
   hooks:
     - id: yamllint
       args: ['-d {extends: relaxed, rules: {line-length: disable}}', '-s']

You can also use repository-local hooks when e.g. it makes sense to distribute the hook scripts with the repository. Install yamllint locally and configure yamllint to your project’s root directory’s .pre-commit-config.yaml as repository local hook. As you can see, I’m using custom config for yamllint.

$ cat .pre-commit-config.yaml
---
- repo: local
  hooks: 
  - id: yamllint
    name: yamllint
    entry: yamllint -c yamllint-config.yml .
    language: python
    types: [file, yaml]

Note: If you’re linting files with other suffix than yaml/yml like ansible template files with .j2 suffix then use types: [file]

Pre-receive hook and yamllint

Using pre-commit-hooks to process commits is easy but often doing checks in server side with pre-receive hooks is better. Pre-receive hooks are useful for satisfying business rules, enforce regulatory compliance, and prevent certain common mistakes. Common use cases are to require commit messages to follow a specific pattern or format, lock a branch or repository by rejecting all pushes, prevent sensitive data from being added to the repository by blocking keywords, patterns or filetypes and prevent a PR author from merging their own changes.

One example of pre-receive hooks is to run a linter like yamllint to ensure that business critical file is valid. In practice the hook works similarly as pre-commit hook but files you check in to repository are not kept there “just like that”. Some of them are stored as deltas to others, or their the contents are compressed. There is no place where these files are guaranteed to exist in their “ready-to-consume” state. So you must take some extra hoops to get your files available for opening them and running checks.

There are different approaches to make files available for pre-receive hook’s script as StackOverflow describes. One way is to check out the files in a temporary location or if you’re on linux you can just point /dev/stdin as input file and put the files through pipe. Both ways have the same principle: checking modified files between new and the old revision and if files are present in new revision, runs the validation script with custom config.

Using /dev/stdin trick in Linux:

#!/usr/bin/env bash
 
set -e
 
ENV_PYTHON='/usr/bin/python'
 
if ((
       (ENV_PYTHON_RETV != 0) && (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi
 
oldrev=$1
newrev=$2
refname=$3
 
while read oldrev newrev refname; do
    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`
 
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    git diff --name-only $oldrev $newrev | while read file; do
        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | egrep '\.yml$' | awk '{ print $3 }'`
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi
 
        # Get file in commit and point /dev/stdin as input file 
        # and put the files through pipe for syntax validation
        echo $file
        git show $newrev:$file | /usr/bin/yamllint -d "{extends: relaxed, rules: {line-length: disable, comments: disable, trailing-spaces: disable, empty-lines: disable}}" /dev/stdin || exit 1
    done
done

Alternative way: copy changed files to temporary location

#!/usr/bin/env bash
 
set -e
 
EXIT_CODE=0
ENV_PYTHON='/usr/bin/python'
COMMAND='/usr/bin/yamllint'
TEMPDIR=`mktemp -d`
 
if ((
        (ENV_PYTHON_RETV != 0) &&
        (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi
 
oldrev=$1
newrev=$2
refname=$3
 
while read oldrev newrev refname; do
 
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    files=`git diff --name-only ${oldrev} ${newrev}`
 
    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`
 
    # Iterate over each of these files
    for file in ${files}; do
 
        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | awk '{ print $3 }'`
 
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi
 
        # Otherwise, create all the necessary sub directories in the new temp directory
        mkdir -p "${TEMPDIR}/`dirname ${file}`" &>/dev/null
        # and output the object content into it's original file name
        git cat-file blob ${object} > ${TEMPDIR}/${file}
 
    done;
done
 
# Now loop over each file in the temp dir to parse them for valid syntax
files_found=`find ${TEMPDIR} -name '*.yml'`
for fname in ${files_found}; do
    ${COMMAND} ${fname}
    if [[ $? -ne 0 ]];
    then
      echo "ERROR: parser failed on ${fname}"
      BAD_FILE=1
    fi
done;
 
rm -rf ${TEMPDIR} &> /dev/null
 
if [[ $BAD_FILE -eq 1 ]]
then
  exit 1
fi
 
exit 0

Testing pre-receive hook locally is a bit more difficult than pre-commit-hook as you need to get the environment where you have to remote repository. Fortunately you can use the process which is described for GitHub Enterprise pre-receive hooks. You create a local Docker environment to act as a remote repository that can execute the pre-receive hook.

Getting Git Right in Helsinki

Software development is fun if you have tools which work great and support what you’re doing. So it was finally great to get hear Sven Peters talking about better software development in teams as Atlassian’s Getting Git Right landed to Helsinki (24.11.2014). Event about Git and of course about Atlassian’s tools.

Getting Git right by svenpet and durdn

Getting Git Right’s main theme was about happy developers, productive teams and how Git and Atlassian’s tools help to achieve that. Sven Peters and Nicola Paolucci presented how to be a happier developer with Git, and how to ship software faster and smarter. It’s good to remember that developing software is after all a social challenge, not a technical one. And Git helps you with it. The presentation slides are available at SlideShare and you can also watch it on Youtube (different event).

Git: You can rewrite history. Timemachine without paradoxes

Nicola Paolucci gave a nice and 5 minute talk about Git and it’s internals. Lot’s of technical details. The main points were “Fast and compact”, “Freedom and safety”, “Explore and understand”, “Control and assemble”. With Git you can rewrite the history safely to e.g. clean commits. Paolucci showed also some tools to help working with Git on the command line like hooks they use and using “better” prompt like liquid-prompt. For GUI you can use Atlassian’s SourceTree.

Git datamodel

Merging

Interesting part of the event was talk about what is efficient and the best Git workflow? The answer is “we don’t know”. It depends as there are different cultures, different products and different teams. There’s no right way but there are some good workflows which might work for you.

One is to use branch per issue, e.g. hot-fix/jira-30-user-avatars, feature/jira-27-user-sign. The simplest workflow is to use feature branches with develop branch. Then the master is very stable. If you have multiple product versions then release branches are good and bug fixes are done to separate branch and merged to other branches.

They also presented how Atlassian’s Stash can help you to work with Git and branches. Like merging changes to branches can be done automatically with hooks or by using Stash. Stash looked nice for controlling and managing your repository with visual interface.

Code reviews: do they feel like this?

Git also helps you to improve code quality with e.g. code reviews. Code reviews shouldn’t be painful as it’s about team ownership, shared knowledge and aim for better code but often there’s developer guilt. It can be made easier by making code reviews part of your daily work by doing it in small patches like pull requests.

Development is also about communication and for that Atlassian presented HipChat. It looked quite nice tool for following what’s happening in a project with aggregating team chat and information from different tools. Following commits and continuous information brings you clear view what’s happening. There are also alternatives to HipChat like Slack or just basic IRC.

But why should you use Git? Benefits like more time to code, better collaboration, dev productivity and it’s the future doesn’t convince everyone. Like pointy haired bosses. So it’s good to remember that Git is also about economics. Delivering software faster, having less bugs and thus having happy customers. Shipping software faster and smarter.

Why Git? from Peters’ slides:

Why Git?

It’s also about economics

In Questions and Answers session there was talk about Atlassian’s strategy with Bitbucket and Stash. They said that both are going strong as they have different use cases. Stash has more enterprise features and you can have the repository on your own premises. Bitbucket is about hosting the repository in Atlassian’s platform and for small and medium team. What I have used Bitbucket it’s nice service but not as user friendly as GitHub. Another interesting question was about storing binary files in Git. There’s no optimal solution yet but just some workarounds like git-annex and git-media which allows managing files with git, without checking the file contents into git. In practice you shouldn’t store binary files in Git and you should separate product to code and assets.

Summary

Atlassians Getting Git Right was nice event and gave good overview about Git and how to use it in software development team. It would have been nice to hear something about the alternatives to Atlassian’s tools (BitBucket, Stash, SourceTree and HipChat) which helps you to do better software development. I can’t deny that Atlassian’s tools work nicely together but sometimes the price is just too high.

Now it’s time to start using Git also on work projects and as all participants got “Just do Git” T-shirt it’s easier :) Thanks to Atlassian and Ambientia for arranging this event.