Building in isolation with Docker

by Bruce Szalwinski

Background

Over at Device Detection, I wrote about creating an Apache Handler that could be used to do real time device detection.  The handler has a number of dependencies on other Perl modules, 51Degrees, JSON, Apache2::Filter, Apache2:RequestRec, Apache2::RequestUtil, etc.  And those modules have dependencies as well.  I wanted our build server, Bamboo, to do the building of my module without the nasty side effects of having to install third party libs into all of the build agents.  In the Maven world, I would just add all of these dependencies to my pom.xml and maven would download the dependencies from the repository into my local environment.  At build time, Bamboo would take care of establishing a clean environment, download my dependencies and most importantly, when the build was complete, the plates would be wiped clean ready to serve another guest leaving no traces of my build party.  The challenge then, how to do this in Perl world.  Spoiler alert, full source code is available at DeviceDetection.

Enter Docker

homer-isolation-tank

The fancy new way for developers to deliver applications to production is via Docker.  Developers define the application dependencies via a plain old text file, conventionally named Dockerfile.  Using the Docker toolkit, the Dockerfile is used to build a portable image that can be deployed to any environment.  A running image is known as a container, which behaves like an operating system running inside of a host operating system.  This is a lot like VMs but lighter weight.  Developers are now empowered to deliver an immutable container to production.  Let’s see if this can also be used to provide an isolated build environment.

docker-big-picture

For the big picture folks, here is what we are trying to do.  We’ll start by defining all of the application dependencies in our Dockerfile.  We’ll use the Docker toolkit to build an image from this Dockerfile.  We’ll run the image to produce a container.  The container’s job will be to build and test the Perl modules and when all tests are successful, produce an RPM.

Building the image was an iterative process and I had fun dusting off my sysadmin hat.  Here is where I finally ended up.

FROM google/debian:wheezy
RUN    apt-get -y install make gcc build-essential sudo
RUN    apt-get -y install apache2-threaded-dev
RUN    apt-get -y install libapache2-mod-perl2
RUN    apt-get -y install libtest-harness-perl libtap-formatter-junit-perl libjson-perl
RUN apt-get -y install rpm

Let’s break this down.   The first non-comment line in the Dockerfile must be the “FROM” command.  This defines the image upon which our image will be based.  I’m using the “google/debian” image tagged as “wheezy”.   Think of images as layers.  Each image may have dependencies on images below it.  Eventually, you get to a base image, which is defined as an image without a parent.

FROM google/debian:wheezy

The RUN command is used to add layers to the image, creating a new image with each successful command.  The 51Degrees Perl module is built using the traditional Makefile.PL process, so we start by installing the make, gcc and build-essentials.  Containers generally run as root so we wouldn’t normally need to install sudo, but our handler uses Apache::Test for its unit test and Apache::Test doesn’t allow root to create the required httpd process.  So we will end up running our install as a non-root user and give that user sudo capabilities.  More about that in a bit.

RUN     apt-get -y install make gcc build-essential sudo

Next, we install our apache environment.  With Apache2, there is a pre-fork and a threaded version which has to do with how apache handles multi-processing.  For my purposes, I didn’t really care which one I picked. It was important however to pickup the -dev version as this includes additional testing features.

RUN     apt-get -y install apache2-threaded-dev

Next, we install mod-perl since the device detector is a mod perl handler.

RUN     apt-get -y install libapache2-mod-perl2

Next, add our Perl dependencies.  Each Linux distro has its own way of naming Perl modules.  Why?  Because they can.  Debian uses “lib” prefix and “-perl” suffix, converts “::” to “-“, and lower cases everything.  To install the Perl module known as “Test::Harness”, you would request “libtest-harness-perl”.

RUN     apt-get -y install libtest-harness-perl libtap-formatter-junit-perl libjson-perl

And since we’ll be delivering a couple of RPMs at the end of this, we install the rpm package.

RUN apt-get -y install rpm

With the Dockerfile in place, it is time to build our image.  We tell docker to build our image and tag it as “device-detection”.  We tell docker to look in the current directory for a file named Dockerfile.

$ docker build -t device-detection .

Time for some coffee as docker downloads the internet and builds our image.  Here is the pretty version of the log produced after the initial construction of the image.  If there are no changes to the Dockerfile, then the image is just assembled from the cached results.  The 12 character hex strings are the ids (full ids are really  64 characters long) of the images that are saved after each step.

Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM google/debian:wheezy
 ---> 11971b6377ef
Step 1 : RUN apt-get -y install make gcc build-essential sudo
 ---> Using cache
 ---> 2438117da917
Step 2 : RUN apt-get -y install apache2-threaded-dev
 ---> Using cache
 ---> 41f878809025
Step 3 : RUN apt-get -y install libapache2-mod-perl2
 ---> Using cache
 ---> 43eadc4ec9eb
Step 4 : RUN apt-get -y install libtest-harness-perl libtap-formatter-junit-perl libjson-perl
 ---> Using cache
 ---> 106d5f017b5c
Step 5 : RUN apt-get -y install rpm
 ---> Using cache
 ---> fd0dc5f192d6
Successfully built fd0dc5f192d6

Use the docker images to see the images that have been built.  The ubuntu/14.04 was before I got religion and started using google/debian.

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
device-detection    latest              2f2b3e38e8c1        29 minutes ago      358.6 MB
ubuntu              14.04               d0955f21bf24        2 weeks ago         188.3 MB
google/debian       wheezy              11971b6377ef        9 weeks ago         88.2 MB

Running the Image

At this point, we have an image that contains our isolated build environment.  Now we are ready to do some building by running the image.  In Docker terms, a running image is known as a container.  The build-docker script will be used to produce a container.   When we create our Bamboo build plan, this is the script that we will execute.

#!/bin/bash
docker run --rm -v $PWD:/opt/51d device-detection:latest /opt/51d/entry.sh

The –rm removes the container when finished. The -v mounts the current directory as /opt/51d inside of the container.  The device-detection:latest refers to our image that we just built.  And finally, the /opt/51d/entry.sh is the command to execute inside of the container.

#!/bin/bash
adduser --disabled-password --gecos '' r
adduser r sudo
echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
su -m r -c /opt/51d/build

The entry.sh script will be executed inside of the container.  To test the handler, we’ll need the vendor’s module installed.  And to install modules, we need to have root privileges.  We are going to use Apache::Test to test the handler but Apache::Test won’t let us start the httpd process as root.  The solution is to create a new user, r, and give him sudo capabilities.  With that in place, we hand off execution to the next process /opt/51d/build.

That all worked well on my local environment, but something interesting happened when I went to deploy this from Bamboo. The owner of the files in the container turned out to be the user that built the container.  I was the one building the container in my local environment, but I wasn’t the one building the container inside of Bamboo.  When the user ‘r’ attempted to create a file, it got a permission denied error because the directories are not owned by him.  I discovered this by having Bamboo list the files from inside the running container. They are owned by the mysterious user with UID:GID of 3366:777.

build   08-Apr-2015 09:28:35    /opt/51d:
build   08-Apr-2015 09:28:35    total 28
build   08-Apr-2015 09:28:35    drwxr-xr-x 7 3366 777 4096 Apr  8 16:28 51Degrees-PatternWrapper-Perl
build   08-Apr-2015 09:28:35    drwxr-xr-x 5 3366 777 4096 Apr  8 16:28 CDK-51DegreesFilter
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 3366 777  530 Apr  8 16:28 build
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 3366 777  120 Apr  8 16:28 build-docker
build   08-Apr-2015 09:28:35    drwxr-xr-x 2 3366 777 4096 Apr  8 16:28 docker
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 3366 777  146 Apr  8 16:28 entry.sh
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 3366 777  497 Apr  8 16:28 rpm.sh

We can use this UID:GID information when creating our user. The stat command can be used to return the UID and GID of a file.  We’ll create a group associated with the group that owns the /opt/51d directory and then we’ll create our user with the UID and GID associated with the owner of the directory.  Our modified entry.sh script is then:

#!/bin/bash
addgroup --gid=$(stat -c %g /opt/51d) r
adduser --disabled-password --gecos '' --uid=$(stat -c %u /opt/51d) --gid=$(stat -c %g /opt/51d) r
adduser r sudo
echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
su -m r -c /opt/51d/build

And we can see that user ‘r’ is now the “owner” of the files.

build   08-Apr-2015 09:28:35    /opt/51d:
build   08-Apr-2015 09:28:35    total 28
build   08-Apr-2015 09:28:35    drwxr-xr-x 7 r r 4096 Apr  8 16:28 51Degrees-PatternWrapper-Perl
build   08-Apr-2015 09:28:35    drwxr-xr-x 5 r r 4096 Apr  8 16:28 CDK-51DegreesFilter
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 r r  530 Apr  8 16:28 build
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 r r  120 Apr  8 16:28 build-docker
build   08-Apr-2015 09:28:35    drwxr-xr-x 2 r r 4096 Apr  8 16:28 docker
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 r r  146 Apr  8 16:28 entry.sh
build   08-Apr-2015 09:28:35    -rwxr-xr-x 1 r r  497 Apr  8 16:28 rpm.sh

With the user setup, entry.sh hands off control to the build script to do the heavy lifting.  Here we setup our apache environment and start building the two Perl modules.  The () is a convient bash-ism that creates a sub-process, leaving us in the current directory when completed.  And the PERL_TEST_HARNESS_DUMP_TAP is an environment variable recognized by Tap::Formatter:Junit package.  Unit tests will live at the location specified by this variable.

#!/bin/bash

source /etc/apache2/envvars
export APACHE_TEST_HTTPD=/usr/sbin/apache2
export PERL_TEST_HARNESS_DUMP_TAP=/opt/51d/CDK-51DegreesFilter/dist/results

(cd /opt/51d/51Degrees-PatternWrapper-Perl && \
        perl Makefile.PL && \
        make && \
        make dist && \
        sudo make install && \
        ../rpm.sh FiftyOneDegrees-PatternV3-0.01.tar.gz)

(cd /opt/51d/CDK-51DegreesFilter && \
        perl Build.PL && \
        ./Build && \
        ./Build test && \
        ./Build dist && \
        ../rpm.sh CDK-51DegreesFilter-0.01.tar.gz)

When the build script completes, we are done and the container is stopped and removed.  Because we have mounted the current directory inside of container, artifacts produced by the container are available after the build completes.  This is exactly the side effect we need to have.  We can publish the tests results produced by the build process as well as the RPMs.  And we have accomplished the goal of having an isolated build environment, D’oh!

Fun things learned along the way

Inside of the container, Apache::Test starts an httpd server on port 8529. It then tries to setup the mod_cgi library by binding a socket to a filehandle in the /opt/51d/CDK-51DegreesFilter/t/logs directory via this directive:


<IfModule mod_cgid.c>
    ScriptSock /opt/51d/CDK-51DegreesFilter/t/logs/cgisock
</IfModule>

The httpd server had issues with this, not sure why, perhaps because Docker is binding the file system as well.  I resolved it by moving the ScriptSock location to /tmp/cgisock.  More details on this conundrum are available at stackoverflow where I asked and answered my own question,  http://stackoverflow.com/questions/29424132/error-accessing-cgi-script-inside-docker-container-operation-not-permitted-cou.

Device Detection

by Bruce Szalwinski

Background

The good folks that power Apache Mobile Filter use version 2 of the Device Repository from 51Degrees and currently have no plans for updating their software to use version 3.  Since we currently use the AMF handler to do device detection and since 51Degrees has announced the end of life for version 2, this provides an opportunity for us to write our own handler.  We attempted to do this in version 2 days, but there was no Perl API offered from 51Degrees and the C code was pretty shaky.  With version 3, the 51 Degrees folks now offer a Perl API that wraps around a much more robust C API.  With that, the stage is set to tackle writing our own Apache Handler.   I’ll use the Apache::Test module to help drive the development.  This article from last decade was very helpful in learning how to use this powerful module.  Full source code is available at DeviceDetection.

Requirements

Analyze web traffic by a user specified set of device properties.

Implementation

An Apache Handler allows for the customization of the default behavior of the web server.  We will write a handler that reads the user agent from the request, detects the device associated with the user agent, creates environment variables for each requested device property and writes the values to a log file.  Let’s get started.

Write tests first

To test our handler, we’ll send requests to an apache server, passing in various user agent strings and validating that we receive known device id values.

use strict;
use warnings FATAL => 'all';

use Apache::TestTrace;
use Apache::Test qw(plan ok have_lwp);
use Apache::TestRequest qw(GET);
use Apache::TestUtil qw(t_cmp);
use Apache2::Const qw(HTTP_OK);

use JSON;

plan tests => 6, have_lwp;

detect_device('','15364-5690-17190-18092');
detect_device('unknown', '15364-5690-17190-18092');
detect_device(
 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36",
  "15364-18110-25377-18092");

sub detect_device {
  my ($user_agent, $device_id) = @_;

  Apache::TestRequest::user_agent(
    reset => 1,
    agent => $user_agent
  );

  my $response = GET '/cgi-bin/index.cgi';
  my $json = decode_json $response->content;

  debug "response", $response;

  ok defined($json->{_51D_ID}) eq 1;
  ok $json->{_51D_ID} eq $device_id;
}

Great, we have a unit test and it fails miserably because we don’t have a apache server.  We need an apache server that we can start, stop and configure for every test run.  Conveniently, the Apache::Test module provides a “whole, pristine and isolated” apache server at our disposal.  Cool, we have a server.  Next, the handler will push device properties into the environment via the subprocess environment table so we need a way to capture those values.  Line 28 shows how the unit tests make a call to a CGI script.  The CGI script will simply grab the variables pushed into the environment by the handler and return them to the test.


#!/usr/bin/perl

use CGI qw(:standard -no_xhtml -debug);
use JSON;

print header('application/json');

my %properties;

while ( my ($key, $value) = each(%ENV)) {
  if ( $key =~ /^_51D/) {
    $properties{$key} = $value;
  }
}

print encode_json \%properties;

Ok, so we have a failing test, a web server, and a way of communicating between the two.  We have a little bit of wiring to do to let the server know about our CGI script as well as our handler.  By convention, Apache::Test will look for a file called t/conf/extra.conf.in.  This file contains configuration directives that will be added to httpd.conf before starting the server.  We’ll take this opportunity to configure the execution of our index.cgi test harness, configure our log format and setup our handler.


PerlSwitches -w

ScriptAlias /cgi-bin @ServerRoot@/cgi-bin
<Location /cgi-bin>
  SetHandler cgi-script
  Options +ExecCGI +Includes
</Location>

LogFormat "%{_51D_ID}e|%{User-Agent}i" combined

PerlTransHandler +CDK::51DegreesFilter
PerlSetEnv DeviceRepository @ServerRoot@/data/51Degrees-Lite.dat
PerlSetEnv DevicePropertyList ScreenPixelsHeight,BatteryCapacity
PerlSetEnv DevicePrefix _51D

Man, when is this guy ever going to get around to writing some code?  Almost there. The Apache::TestRunPerl and Apache::TestMM modules combine together to provide all that is necessary to start, configure and stop Apache, as well as run all of the individual unit tests.  These get added into our Build.PL script.  The test action normally just executes tests.  We need to subclass this action so that we can start the server before the  tests executes and stop it when complete.   It would also be nice to produce Junit style output of the test results so that they can be published by the build server.


use Module::Build;
use ModPerl::MM ();
use Apache::TestMM qw(test clean);
use Apache::TestRunPerl ();
use IO::File;

my $class = Module::Build->subclass(
    class => 'CDK::Builder',
    code => q{
	sub ACTION_test {
	    my $self = shift;
	    $self->do_system('t/TEST -start-httpd');
	    $self->SUPER::ACTION_test();
	    $self->do_system('t/TEST -stop-httpd');
	}
    }
);

my $build = $class->new (
  module_name => 'CDK::51DegreesFilter',
  license => 'perl',
  test_file_exts => [qw(.t)],
  use_tap_harness => 1,
  tap_harness_args => {
    sources => {
      File => {
        extensions => ['.tap', '.txt'],
      },
    },
    formatter_class => 'TAP::Formatter::JUnit',
  },
  build_requires => {
      'Module::Build' => '0.30',
      'TAP::Harness'  => '3.18',
  },
  test_requires => {
      'Apache::Test' => 0,
  },
  requires => {
      'mod_perl2' => 0,
      'FiftyOneDegrees::PatternV3' => 0,
      'JSON' => 0,
      'Apache2::Filter' => 0,
      'Apache2::RequestRec' => 0,
      'Apache2::RequestUtil' => 0,
      'Apache2::Log' => 0,
      'Apache2::Const' => 0,
      'APR::Table' => 0
  }
);

Apache::TestMM::filter_args();
Apache::TestRunPerl->generate_script();

$build->create_build_script;

 

Handler

Finally.  At this point, writing the handler is pretty anti climatic.  It reads the user agent from the header and passes it to the getMatch method from 51Degrees.  A set of device properties are returned as a JSON object.  Each requested property, defined by DevicePropertyList, is added to the environment, via subprocess_env().  The AMF handler used a caching mechanism to avoid detection costs for previously seen user agents.  The 51D folks said the new version was faster, so I wouldn’t need it.  Performance testing will prove this out.


sub handler {
  my $f = shift;

  my $user_agent=$f->headers_in->{'User-Agent'} || '';
  my $json = FiftyOneDegrees::PatternV3::getMatch($dataset, $user_agent);
  my %properties = %{ decode_json($json) };

  while ( my ($key, $value) = each(%properties) ) {
    my $dkey = uc("${prefix}_${key}");
    $f->subprocess_env($dkey => $value);
  }

  return Apache2::Const::DECLINED;
}

 

Performance

To test performance, I setup Jmeter with 5 threads on a sandbox machine and looped over a set of 350K unique user agents.  The Jmeter instance made requests to apache running on a second sandbox machine with the new handler installed.  With 2,428,264 requests under its belt, the average response time is 10ms.  For v2, with caching, the average response time was 16ms.

Escaping Technical Debt

By Osman Shoukry (@oshoukry) & Kris Young (@thehybridform)

The Visit

On October 6th we had Michael Feathers (@mfeathers) author of Working Effectively With Legacy Code visit our facility.  The visit was two achieve two objectives.  The first was to give tech talks to our engineers about legacy code.  The second was to train selected key individuals in the organization.  Specifically, techniques and skills on how to deal with legacy code.

Mr. Feathers graciously agreed to give a recorded community talk about Escaping Technical Debt.

 

Key Takeaways

  • Tech debt

Technical debt is a metaphor for the amount of resistance in a system to change.  The larger the debt the higher the resistance.  When change is introduced, the time spent looking for where to make a change is one example of tech debt.  Another is when the system breaks in unexpected ways.

  • It takes a community to mitigate technical debt

Tech debt affects everybody, from engineers, to product owners to the CEO.  Mitigating tech debt requires everyone’s support and involvement.  As Engineers, we are responsible for how to mitigate technical debt.  The product owners should have input on tech debt mitigation effort.  The benefits of tech debt cleanup should be visible to everyone.  Don’t surprise your peers or management by the sudden change in productivity.  They will bring ideas to you to help pay down the debt in more enabling ways for the future… Involve them!

  • Don’t pay the dead

Code that doesn’t change, even if it is in production, doesn’t collect debt.  Dead debt is any part of the system that doesn’t change including bugs and poor design.  Many times engineers get wrapped up cleaning code that isn’t changing.  Resist the urge to refactor unchanging parts of the system.  If it isn’t changing it, it isn’t technical debt, it is dead debt, walk away.

  • Size your debt

Target the most complex changing code ordered by frequency of change.  Build a dashboard to show the amount of changing code by frequency.  These are the most expensive tech debt hot spots.  This should be the focus of the tech debt cleanup effort.

  • Seal the leak

Technical debt should be paid down immediately.  Simple code is easy to make more complex, and easy to make simple.  However, as the code becomes more complex, the balance tips in the favor of adding complexity than removing it.  No matter how complex the code is, it is always going to be easier to add complexity than remove it.  Inexperienced engineers, fail to see the initial complexity that is added to the system.  In turn, they follow the path of least resistance making the code more complex.

To seal the leak, first identify the most complex and frequently changing code.  Second, give explicit ownership for that code to the most seasoned engineers.  Third, let the seasoned engineers publish the eventual design objectives.  And finally, owners should review all changes to the code they own.

  • Slow down to go fast

Clean code takes time and disciplined effort.  Adequate time is needed to name classes, methods and variables.  Poorly named methods and classes will create confusion.  Confusion will lead to mixing roles and responsibilities.

 

Finally, low tech debt will yield high dividends with compound interest…

“Don’t do half assed, just do half”

%d bloggers like this: