Docker containers and using Alpine Linux for minimal base images

After using Docker for a while, you quickly realize that you spend a lot of time downloading or distributing images. This is not necessarily a bad thing for some but for others that scale their infrastructure are required to store a copy of every image that’s running on each Docker host. One solution to make your images lean is to use Alpine Linux which is a security-oriented, lightweight Linux distribution.

Lately I’ve been working with our Docker images for Java and Node.js microservices and when our stack consist of over twenty services, one thing to consider is how we build our docker images and what distributions to use. Building images upon Debian based distributions like Ubuntu works nicely but it gives packages and services which we don’t need. And that’s why developers are aiming to create the thinnest most usable image possible either by stripping conventional distributions, or using minimal distributions like Alpine Linux.

Choosing your Linux distribution

What’s a good choice of Linux distribution to be used with Docker containers? There was a good discussion in Hacker News about small Docker images, which had good points in the comment section to consider when choosing container operating system.

For some, size is a tiny concern, and far more important concerns are, for example:

  • All the packages in the base system are well maintained and updated with security fixes.
  • It’s still maintained a few years from now.
  • It handles all the special corner cases with Docker.

In the end the choice depends on your needs and how you want to run your services. Some like to use the quite large Phusion Ubuntu base image which is modified for Docker-friendliness, whereas others like to keep things simple and minimal with Alpine Linux.

Divide and conquer?

One question to ask yourself is: do you need full operating system? If you dump an OS in a container you are treating it like a lightweight virtual machine and that might be fine in some cases. If you however restrict it to exactly what you need and its runtime dependencies plus absolutely nothing more then suddenly it’s something else entirely – it’s process isolation, or better yet, it’s portable process isolation.

Other thing to think about is if you should combine multiple processes in single container. For example if you care about logging you shouldn’t use a logger daemon or logrotate in a container, but you probably want to store them externally – in a volume or mounted host directory. SSH server in container could be useful for diagnosing problems in production, but if you have to log in to a container running in production – you’re doing something wrong (and there’s docker exec anyways). And for cron, run it in a separate container and give access to the exact things your cronjob needs.

There are a couple of different schools of thought about how to use docker containers: as a way to distribute and run a single process, or as a lighter form of a virtual machine. It depends on what you’re doing with docker and how you manage your containers/applications. It makes sense to combine some services, but on the other hand you should still separate everything. It’s preferred to isolate every single process and explicitly telling it how to communicate with other processes. It’s sane from many perspectives: security, maintainability, flexibility and speed. But again, where you draw the line is almost always a personal, aesthetic choice. In my opinion it could make sense to combine nginx and php-fpm in a single container.

Minimal approach

Lately, there has been some movement towards minimal distributions like Alpine Linux, and it has got a lot of positive attention from the Docker community. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox using a grsecurity/PaX patched Linux kernel and OpenRC as its init system. In its x86_64 ISO flavor, it weighs in at an 82MB and a container requires no more than 8 MB. Alpine provides a wealth of possible packages via its apk package manager. As it uses musl, you may run into some issues with environments expecting glibc-like behaviour (for example Kubernetes or with compiling some npm modules), but for most use cases it should work just fine. And with minimal base images it’s more convenient to divide your processes to many small containers.

Some advantages for using Alpine Linux are:

  • Speed in which the image is downloaded, installed and running on your Docker host
  • Security is improved as the image has a smaller footprint thus making the attack surface also smaller
  • Faster migration between hosts which is especially helpful in high availability and disaster recovery configurations.
  • Your system admin won’t complain as much as you will use less disk space

For my purposes, I need to run Spring Boot and Node.js applications on Docker containers, and they were easily switched from Debian based images to Alpine Linux without any changes. There are official Docker images for OpenJDK/OpenJRE on Alpine and Dockerfiles for running Oracle Java on Alpine. Although there isn’t an official Node.js image built on Alpine, you can easily make your own Dockerfile or use community provided files. When official Java Docker image is 642 MB, Alpine Linux with OpenJDK 8 is 150 MB and with Oracle JDK 382 MB (can be stripped down to 172 MB). With official Node.js image it’s 651 MB (or if using slim 211 MB) and with Alpine Linux that’s 36 MB. That’s a quite a reduction in size.

Examples of using minimal container based on Alpine Linux:

For Node.js:

FROM alpine:edge
 
ENV NODE_ALPINE_VERSION=6.2.0-r0
 
RUN apk update && apk upgrade \
    && apk add nodejs="$NODE_ALPINE_VERSION"

For Java applications with OpenJDK:

FROM alpine:edge
ENV LANG C.UTF-8
 
RUN { \
      echo '#!/bin/sh'; \
      echo 'set -e'; \
      echo; \
      echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
   } > /usr/local/bin/docker-java-home \
   && chmod +x /usr/local/bin/docker-java-home
 
ENV JAVA_HOME /usr/lib/jvm/java-1.8-openjdk
ENV PATH $PATH:$JAVA_HOME/bin
ENV JAVA_VERSION 8u92
ENV JAVA_ALPINE_VERSION 8.92.14-r0
 
RUN set -x \
    && apk update && apk upgrade \
    && apk add --no-cache bash \
    && apk add --no-cache \
      openjdk8="$JAVA_ALPINE_VERSION" \
    && [ "$JAVA_HOME" = "$(docker-java-home)" ]

If you want to read more about running services on Alpine Linux, check Atlassian’s Nicola Paolucci’s nice article about experiences of running Java apps on Alpine.

Go small or go home?

So, should you use Alpine Linux for running your application on Docker? As also Docker official images are moving to Alpine Linux then it seems to make perfect sense from both a performance and security perspectives to switch to Alpine. And if you don’t want to take the leap from Debian or Ubuntu or want support from the downstream vendor you should consider stripping it from unneeded files to make it smaller.

Weekly notes 8

The Spring has been quite busy at work but Summer is just around the corner and that means either holidays or having some time to learn new things and see how things could be make better. My weekly notes has turned out to be monthly notes but that’s how things sometimes work out. But back to the issue which covers topics about continuous learning, best practices in development, looks into building blocks in Netflix’s stack and how to get started with ELK stack. And for the Summer project there’s Stanford’s Swift and iOS 9 course. Having done my iOS app with Swift it seems to be nice language.

Weekly notes, issue 8, 19.5.2016

Learning new things

Developing iOS 9 Apps With Swift from Stanford
Stanford iOS course is updated for Swift and iOS 9 and is good resource for learning iOS, Swift, or just to refresh yourself on best practices when developing for the platform. (Indie iOS focus weekly, issue 66)

Keep on learning and keep it simple

The single biggest mistake programmers make every day
Nice writeup of basic principles in programming. In short: Keep It Stupid Simple. Make it work, make it right, make it fast. Do One Thing.

Being A Developer After 40
Software development is always changing which this article tells nicely and gives good advice for the young at heart how to reach the glorious age of 40 as a happy software developer. tl;dr; Forget the hype, Choose your galaxy wisely, Learn about software history, Keep on learning, Teach, Workplaces suck, Know your worth, Send the elevator down, LLVM, Follow your gut, APIs are king, Fight complexity,

5 Tips To Improve Your JS with ES6
A well recorded hour long remote talk covering not only some handy ES6 tips, but how to work with ES6 generally and some of the tools available. (from JavaScript Weekly, issue 274)

Microservices, best practices and Java

Microservices are about applying a group of Best Practices
Moving an existing codebase to a microservice architecture is no small feat. And that’s not even taking into account the non-technical challenges. We definitely need more nuanced strategies based on actual production experience with microservices to help drive these architectural decisions. (from Java Web Weekly 123)

jDays 2016: Java EE Microservices Platforms
A lot of people preach that you can’t build microservices with Java EE but Steve Millidge’s talk about Java EE Microservices Platforms tells us that Payara Micro and Wildfly Swarm are fast and have a small memory footprint and that it does not require any code changes to port the application from one to other. (from Java Web Weekly 18/16)

The Netflix Stack: Part 1, Part 2 and Part 3
Microservices architecture in software development is what you should nowadays do but the question is how? The Netflix Stack article series covers some open source libraries you can use to build your architecture. Part 1 covers Eureka for service discovery and Part 2 is about Hystrix, latency and fault tolerance library. Part 3 is about creating rest clients for all of your services. The blog posts are an overview of what you can find in the accompanying repository.

Java app monitoring with ELK: part 1: Logstash and Logback and part 2: ElasticSearch
These blog posts tells you about the ELK stack (ElastichSearch, Logtash, Kibana) which is useful tool for logging visualization and analysis. (from Java Web Weekly 116)

SQL

10 SQL tricks that you didn’t think were possible
Lukas Eder tells you 10 SQL tricks that many of you might not have thought were possible. The article is a summary of his extremely fast-paced, ridiculously childish-humoured talk. “SQL is the original microservice”.

Tools of the trade

mrzool/bash-sensible
“A a simple starting point for a better Bash user experience out of the box.” These settings do make Bash easier and more useful. (from Weekend Reading)

Stranger Danger: Addressing the Security Risk in NPM Dependencies
Presentation from the O’Reilly Fluent Conference by Snyk co-founders which covers recently found exploit, and shows you how to use Snyk in your development workflow.

Something different

Dlexsiya
Interesting simulation with JavaScript how the web looks like to people with dyslexia. In the comments person with dyslexia tells that it’s easier to read when the text shifts. So, would dyslexia mode be good for website UX :) (from Weekend Reading)