Web analytics with Piwik: keeping control over your own data

Web analytics is one the essential tools for a website and including measuring web traffic and getting information about the number of visitors it can be also used as a tool to assess and improve the effectiveness of a website. The most common way to collect data is to use on-site web analytics, measure a visitor’s behavior once on your website, with page tagging technology like on Google Analytics which is widely used web analytics service. But what would you use if you want to keep control over your own data?

You don’t have to look far as the only open source web analytics application is Piwik which aims to be the ultimate open alternative to Google Analytics. Here’s a short overview to Piwik Analytics and how to get started with it.

“Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage.” – Wikipedia

Piwik Open Analytics Platform

Piwik is web analytics application which tracks online visits to one or more websites and displays reports on these visits for analysis. In short it aims to be the ultimate open source alternative to Google Analytics. The code is GPL v3 licensed and available in GitHub. In technical side Piwik is written in PHP, uses MySQL database and you can host it by yourself. And if you don’t want to setup or host Piwik yourself you can also get commercial services.

Piwik provides the usual features you would expect from a web analytics application. You get reports regarding the geographic location of visits, the source of visits, the technical capabilities of visitors, what the visitors did and the time of visits. Piwik also provides features for analysis of the data it accumulates such as saving notes to data, goals for actions, transitions for seeing how visitors navigate, overlaying analytics data on top of a website and displaying how metrics change over time. The easiest way to see what it has to offer is to check the Piwik online demo.

Feature highlights

You might ask how Piwik differs from other web analytics applications such as Google Analytics? One principle advantage of using Piwik is that you are in control. You can host Piwik on your own server and the data is tracked inside your MySQL database: you’ve full control over your data. Software as a service analytics applications on the other hand, have full access to the data users collect. Data privacy is essential for public sector and enterprises who can’t or don’t want to share it for example with Google. You ensure that your visitors behavior on your website is not shared with advertising companies.

Other interesting feature is that it provides advanced privacy options: ability to anonymize IP addresses, purge tracking data regularly (but not report data), opt-out support and Do Not Track support. Your website visitors can decide if they want to be tracked.

You can also do scheduled reports which are sent by e-mail, import data from web server logs, use the API for accessing reports and administrative functions and Piwik also has mobile app to access the analytics data. Piwik is also customizable with plugins and you can integrate it with WordPress and other applications.

Piwik’s User Interface

Piwik has clean and simple user interface as seen in the following screenshots (taken from the online demo).

Piwik main view
Piwik main view

Piwik visitors overview
Piwik visitors overview

Setting up Piwik

Setting up Piwik is easy and there’s good documention available for running Piwik web analytics. All you need is web server like Nginx, PHP 5.5 and MySQL or MariaDB. You can setup it manually but the most easiest way to start with it is to use the provided Docker image and docker-compose. The docker-compose file setups four containers (MySQL, Piwik, Nginx and Cron) and with compose you can start it up. The Piwik image is available from official docker-library.

The alternative is to do your own Docker image for Piwik and related services. In my opinion it makes sense to have just two containers: one for Piwik related web stuff and other for MySQL. The Piwik container runs Piwik, Nginx and Cron script with e.g. supervisor. The official image uses Debian (from PHP) but Piwik runs nicely also on Alpine Linux. One thing to tinker with when using Docker is to get MySQL access to Piwik’s assets for LOAD DATA INFILE which will greatly speed Piwik’s archiving process.

If you’re setting up Piwik manually you can watch a video of installation and after that a video of configuring settings. After you’re done with the 5 minute installation you get the JavaScript tag which you add to the bottom of each page of your website. If you’re using React there’s Piwik analytics component for React Router. Piwik will then record the activity across your website within your database.

And that’s about all there is to starting with Piwik. Simple setup with Docker or doing it manually, adding the JavaScript tag, configuring some options if needed and then just wait for the data from visitors.

Summary

Piwik is good and feature rich alternative for web analytics application. Setting it up isn’t as straightforward as using some hosted service as Google Analytics but that’s the way self-hosted services always are. If you need web analytics and want to keep control of your own data and don’t mind hosting it yourseld and paying for the server then Piwik is a good choice.

Weekly notes 9

Summer is here and mountain biking trails are calling but keeping up with what happens in the field never stops. This week Apple had their worldwide developers conference which filled up social media although didn’t present anything remarkable. In the other news there was good collection of slides for Java developers, ebook for DevOps and HyperDev looks interesting for quickly bang out JavaScript.

Weekly notes, issue 9, 17.6.2016

Java: stay updated, reactive and in the cloud

13 Decks Java developers must see to stay updated
Selection of nice slideshows for Java developers. Best practices, microservices, debugging, Elasticsearch, SQL.

Java SE 8 best practices
Java 8 best practices by Stephen Colebourne’s is good read. The slides cover all the basic uses, such as lambdas, exceptions, streams and interfaces. (from the “13 Decks Java developers” post)

Microservices + Oracle: A Bright Future
Good slides of what are microservices. Considerations, prerequisites, patterns, technologies and Oracle’s plans. (from the “13 Decks Java developers” post)

Notes on Reactive Programming, Part I: The Reactive Landscape and Part II: Writing Some Code
A solid intro to the reactive programming. And no, it’s no coincidence that this is first. A reactive system is an entirely different beast, and such a good fit for a small set of scenarios. (from Java Web Weekly, Issue 128)

Netflix OSS, Spring Cloud, or Kubernetes? How About All of Them!
The Netflix ecosystem of tools is based on practical usage at scale, so it’s always super useful to go deep into understanding their tools. (from Java Web Weekly, Issue 128)

Takeouts from WWDC 2016


Digging into the dev documentation for APFS, Apple’s new file system

Interesting low level stuff in Mac OS Sierra. APFS takes over HFS+, has native encryption, snapshots (Time Machine done right) and is case-sensitive. Hacker News comments are worth reading.

The 13 biggest announcements from Apple WWDC 2016
WWDC 2016 was about software and incremental changes. Siri is opening up to app developers, iOS is growing up, iOS gets Apple TV remote app and Apple introduces single sign-on system.

Continuous learning

DevOpsSec: Securing Software through Continuous Delivery
DevOpsSec free ebook is worth reading if you’re interested securing software through continuous delivery. Uses case studies from Etsy, Netflix, and the London Multi-Asset Exchange to illustrate the steps leading organizations have taken to secure their DevOps processes.

Microservice Pitfalls & AntiPatterns, Part 1
An anti-pattern is just like a pattern, except that instead of a solution it gives something that looks superficially like a solution but isn’t one. A pitfall is something that was never a good idea, even from the start. (from The Microservice Weekly #31)

Tools of the trade

Introducing HyperDev
HyperDev looks to be an interesting new product at Fog Creek Software (known from e.g. Trello). It’s developer playground for building full-stack web-apps fast. “The fastest way to bang out JavaScript code on Node.js and get it running on the internet.” as Joel Spolsky describes it.

V8, modern JavaScript, and beyond – Google I/O 2016
Debugging Node.js apps with Chrome Developer Tools is soon enabled by coming v8_inspector support.

Something different

Why do we have allergies?
Allergies such as peanut allergy and hay fever make millions of us miserable, but scientists aren’t even sure why they exist.