Git pre-commit and pre-receive hooks: validating YAML

Software development has many steps which you can automate and one useful thing to automate is to add Git commit hooks to validate your commits to version control. Firing off custom client-side and server-side scripts when certain important actions occur. Validating commited files’ contents is important for syntax validity and even more when providing Spring Cloud Config configurations in YAML for microservices as otherwise things fail.

Validating YAML can be done by using a yamllint and hooking it to pre-commit or pre-receive. It does not only check for syntax validity, but for weirdnesses like key repetition and cosmetic problems such as lines length, trailing spaces and indentation. Here’s a short overview to get started with yamllint on Git commit hooks.

Quickstart for yamllint

Installing yamllint

On Fedora / CentOS:
$ sudo dnf install yamllint

using pip, the Python package manager:
$ sudo pip install yamllint

or as in macOS
$ sudo -H python -m pip install yamllint

You can also install yamllint from sources when e.g. network connectivity is limited. The linter depends on pathspec >=0.5.3 and pyyaml >= 3.12.

Custom config

Yamllint is quite strict with validation and you might want to make it a bit more relax with custom configuration. For example I need to allow long lines. You can also disable checks for a specific line with a comment.

$ cat yamllint-config.yml

extends: default

rules:
  line-length: disable
  comments:
    require-starting-space: false

Usage

$ yamllint file.yml other-file.yaml

Usage with custom config:

$ yamllint -c yamllint-config.yml .

Or with custom config without config file:

$ yamllint -d "{extends: relaxed, rules: {line-length: {max: 120}}}" file.yaml

Or more specific case like running yamllint in Jenkins job’s workspace and validating files with specific suffix:

$ find . -type f -iname '*.j2' -exec yamllint -s -c yamllint-config.yaml {} \;

Pre-commit hook and yamllint

Better way to use yamllint is to integrate it with e.g. git and pre-commit-hook or pre-receive-hook. Adding yamllint to pre-commit-hook is easy with pre-commit which is a framework for managing and maintaining multi-language pre-commit hooks.

Installing pre-commit:

Using pip:
$ pip install pre-commit

Or on macOS:
$ brew install pre-commit

To enable yamllint pre-commit plugin you just add a file called .pre-commit-config.yaml to the root of your project and add following snippet to it

$ cat .pre-commit-config.yaml
---
- repo: https://github.com/adrienverge/yamllint.git
  sha: v1.10.0
  hooks:
    - id: yamllint

With custom config and strict mode:

$ cat .pre-commit-config.yaml
---
repos:
 - repo: https://github.com/adrienverge/yamllint.git
   sha: v1.10.0
   hooks:
     - id: yamllint
       args: ['-d {extends: relaxed, rules: {line-length: disable}}', '-s']

You can also use repository-local hooks when e.g. it makes sense to distribute the hook scripts with the repository. Install yamllint locally and configure yamllint to your project’s root directory’s .pre-commit-config.yaml as repository local hook. As you can see, I’m using custom config for yamllint.

$ cat .pre-commit-config.yaml
---
- repo: local
  hooks: 
  - id: yamllint
    name: yamllint
    entry: yamllint -c yamllint-config.yml .
    language: python
    types: [file, yaml]

Note: If you’re linting files with other suffix than yaml/yml like ansible template files with .j2 suffix then use types: [file]

Pre-receive hook and yamllint

Using pre-commit-hooks to process commits is easy but often doing checks in server side with pre-receive hooks is better. Pre-receive hooks are useful for satisfying business rules, enforce regulatory compliance, and prevent certain common mistakes. Common use cases are to require commit messages to follow a specific pattern or format, lock a branch or repository by rejecting all pushes, prevent sensitive data from being added to the repository by blocking keywords, patterns or filetypes and prevent a PR author from merging their own changes.

One example of pre-receive hooks is to run a linter like yamllint to ensure that business critical file is valid. In practice the hook works similarly as pre-commit hook but files you check in to repository are not kept there “just like that”. Some of them are stored as deltas to others, or their the contents are compressed. There is no place where these files are guaranteed to exist in their “ready-to-consume” state. So you must take some extra hoops to get your files available for opening them and running checks.

There are different approaches to make files available for pre-receive hook’s script as StackOverflow describes. One way is to check out the files in a temporary location or if you’re on linux you can just point /dev/stdin as input file and put the files through pipe. Both ways have the same principle: checking modified files between new and the old revision and if files are present in new revision, runs the validation script with custom config.

Using /dev/stdin trick in Linux:

#!/usr/bin/env bash

set -e

ENV_PYTHON='/usr/bin/python'

if ((
       (ENV_PYTHON_RETV != 0) && (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi

oldrev=$1
newrev=$2
refname=$3

while read oldrev newrev refname; do
    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`
    
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    git diff --name-only $oldrev $newrev | while read file; do
        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | egrep '\.yml$' | awk '{ print $3 }'`
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi
        
        # Get file in commit and point /dev/stdin as input file 
        # and put the files through pipe for syntax validation
        echo $file
        git show $newrev:$file | /usr/bin/yamllint -d "{extends: relaxed, rules: {line-length: disable, comments: disable, trailing-spaces: disable, empty-lines: disable}}" /dev/stdin || exit 1
    done
done

Alternative way: copy changed files to temporary location

#!/usr/bin/env bash

set -e

EXIT_CODE=0
ENV_PYTHON='/usr/bin/python'
COMMAND='/usr/bin/yamllint'
TEMPDIR=`mktemp -d`

if ((
        (ENV_PYTHON_RETV != 0) &&
        (YAMLLINT != 0)
)); then
    echo '`python` or `yamllint` not found.'
    exit 1
fi

oldrev=$1
newrev=$2
refname=$3

while read oldrev newrev refname; do
    
    # Get the file names, without directory, of the files that have been modified
    # between the new revision and the old revision
    files=`git diff --name-only ${oldrev} ${newrev}`

    # Get a list of all objects in the new revision
    objects=`git ls-tree --full-name -r ${newrev}`

    # Iterate over each of these files
    for file in ${files}; do

        # Search for the file name in the list of all objects
        object=`echo -e "${objects}" | egrep "(\s)${file}\$" | awk '{ print $3 }'`
        
        # If it's not present, then continue to the the next itteration
        if [ -z ${object} ]; 
        then 
            continue; 
        fi

        # Otherwise, create all the necessary sub directories in the new temp directory
        mkdir -p "${TEMPDIR}/`dirname ${file}`" &>/dev/null
        # and output the object content into it's original file name
        git cat-file blob ${object} > ${TEMPDIR}/${file}
            
    done;
done

# Now loop over each file in the temp dir to parse them for valid syntax
files_found=`find ${TEMPDIR} -name '*.yml'`
for fname in ${files_found}; do
    ${COMMAND} ${fname}
    if [[ $? -ne 0 ]];
    then
      echo "ERROR: parser failed on ${fname}"
      BAD_FILE=1
    fi
done;

rm -rf ${TEMPDIR} &> /dev/null

if [[ $BAD_FILE -eq 1 ]]
then
  exit 1
fi

exit 0

Testing pre-receive hook locally is a bit more difficult than pre-commit-hook as you need to get the environment where you have to remote repository. Fortunately you can use the process which is described for GitHub Enterprise pre-receive hooks. You create a local Docker environment to act as a remote repository that can execute the pre-receive hook.

Posted

December 15, 2017

Development

Marko

Tags:

clean code, development, devops, git

Comments

2 responses to “Git pre-commit and pre-receive hooks: validating YAML”

Sebastian

March 24, 2021

Actually testing server-side hooks isn’t to difficult, you just create a “bare” git-repo and check it out locally:

$ cd my_source
$ git init –bare my-test-repo.bare
$ git clone ./my-test-repo.bare my-test-repo.git

now you can implement the hooks within “my-test-repo.bare/hooks” (there’ll be example hooks). You can use the repo “my-test-repo.git” just like normal, add, commit files and then “push” them to “remote” like normal.

Reply
Ashok G

December 19, 2018

Since it is a automatic trigger in the bitbucket server , how we can pass this parameters each time ?
oldrev=$1
newrev=$2
refname=$3

Or Is it bitbucket pre-receive hook can read them by default ?

Reply