From a Docker engine to Docker Swarm to create Tuleap clusters

When it comes to clustering, the two giants, Kubernetes and Docker Swarm, are going head to head. It is all over the news, and here at Tuleap we have made our choice! But let’s back up a bit first!

Docker swarm

Two years ago, before Docker 1.0 was released, we created our first “Tuleap cluster” from the Docker engine and duct tape. In short, it was a Docker daemon event stream watcher with an inotify-based nginx reload. Although it was functional, it was a nightmare to manage, extend, and scale. Last year, when the early versions of Docker Swarm were released, we created docker swarm prototypes to replace all of the duct tape.

Today, the balance has tipped in favor of Swarm for us, and we will tell you why!

Docker Swarm offers three clear advantages

What does Docker Swarm actually do
  1. First, it delivers native clustering capabilities to turn a group of Docker engines into a single, virtual Docker Engine. And Docker Swarm offers three clear advantages. It will become the de facto standard.
  2. Second, it is simple.
  3. And, third, we have a history with it.

We started working on our clusters in early May 2016, back when Swarm mode was not yet “a thing.” Back then our position was somewhere between standalone Swarm and OpenShift (Kubernetes + Redhat). Post-DockerCon is was fairly obvious that, even if we will have to support OpenShift and K8S (because Tuleap Enterprise customers and leads are asking us to do so), Swarm mode will become the de facto standard.

Swarm mode is buggy, but the UI is fantastic. It is hard to make something so complicated simple, but Docker has done it. Their developers were able to achieve a high level of abstraction and mask the complexity of distributing load across nodes. As stated above, we had already been running tests with standalone Docker Swarm (not the version directly and controversially integrated into the Engine). We were already familiar with Swarm and happy with our tests, so we saw an opportunity to leverage our knowledge and experience by staying with Docker Swarm.

Docker Swarm from a technical point of view

To understand the problem a cluster of containers solves, it is important to go into a few of the technical details behind Docker Swarm. At Tuleap, we have two main needs that Docker Swarm addresses:

  1. Demo platforms for users to try Tuleap’s agile software development features.
  2. A growing trend in the software engineering industry: Instead of having one centrally-managed instance of a service, it is easier and more secure to manage several smaller instances. You get the advantages of a central tool (it always works the same way, there is no need to re-learn, the interface is consistent, etc.) but with the flexibility of smaller instances for things like stopping services and monitoring loads.

We used our demo site as a Tuleap cluster test case. The demo site is a full-featured Tuleap site. It is pre-loaded so that users can test it using real-world data. Because the demo site serves as a playground, the data is “cleaned up” every night to keep it operating property.

How our Tuleap cluster works

Our infrastructure consists of four physical servers hosted at Online.net. We decided to use three servers for our Swarm cluster, plus one server for data storage. We wanted to pool persistent data between the different nodes and containers of the cluster. All three nodes have access to data storage in NFS.

The demo.tuleap.org stack consists of four containers dispatched on three nodes:
  • A reverse-proxy shared with other stacks
  • A proxy cache to serve static media
  • A Tuleap front-end
  • A Tuleap real-time event stream server
  • A database

Tuleap docker cluster

Tuleap Docker Cluster

Docker Swarm mode setup

To build our cluster we assigned three nodes. For reliability and cost-efficiency reasons we wanted the nodes in manager mode. We needed at least three manager nodes to ensure raft master election and to ensure uptime even if one server went down. However, having three dedicated managers is not the most efficient use of resources. For a first iteration it is a good enough trade-off. We will assign worker mode to any future additional nodes.

The nodes are composed of two network interfaces: one on the network (Internet) and one on a private network. This is the network interface used for our Swarm network in order to dissociate the two networks.

We started by initializing our Swarm cluster:
[tty@node-01]$ sudo docker swarm init \
    --advertise-addr ${PRIVATE_IP_NODE_1}:2377 \
    --listen-addr ${PRIVATE_IP_NODE_1}:2377

Swarm initialized: current node (lanodex79l3nzbj1waspjojhi) is now a manager.
To add a worker to this swarm, run the following command:

    docker swarm join \
        --token SWMTKN-1-${TOKEN_WORKER} \
        ${PRIVATE_IP_NODE_1}:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
By default Swarm provides a token to add one or more nodes in worker mode. The first node (the one created in the first step) is assigned manager status. We checked this with the following command:
[tty@node-01]$ sudo docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
lanodex79l3nzbj1waspjojhi *  node-01   Ready   Active        Leader
To have only manager nodes, we had to request another manager-specific token.
[tty@node-01]$ sudo docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join \
        --token SWMTKN-1-${TOKEN_MANAGER} \
        ${PRIVATE_IP_NODE_1}:2377
Once we had this token, we could add the other two nodes, remembering to specify the private-network IP address.
[tty@node-02]$ sudo docker swarm join --token SWMTKN-1-${TOKEN_MANAGER} \
	--advertise-addr ${PRIVATE_IP_NODE_2}:2377 \
	--listen-addr ${PRIVATE_IP_NODE_2}:2377 \
	${PRIVATE_IP_NODE_1}:2377

[tty@node-03]$ sudo docker swarm join --token SWMTKN-1-${TOKEN_MANAGER} \
	--advertise-addr ${PRIVATE_IP_NODE_2}:2377 \
	--listen-addr ${PRIVATE_IP_NODE_2}:2377 \
	${PRIVATE_IP_NODE_1}:2377
The nodes in our Swarm cluster then looked like this:
[tty@node-01]$ docker node ls
 ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
 rcfa99pzw7ndu2da2dh2rypep    node-03  Ready   Active        Reachable
 scmf14xd13qq99nzh3x12tu85    node-02  Ready   Active        Reachable
 lanodex79l3nzbj1waspjojhi *  node-01  Ready   Active        Leader

The Swarm cluster was set up. The next step was to deploy the Tuleap stacks across the nodes.

Tuleap stack deployment

In this example, we deployed two Tuleap stacks and planned to run the commands directly on one of the nodes. In actual production, of course, everything is automated using Ansible. That will be covered in a separate post!

As a reminder, our stack is composed of four containers and one reverse-proxy that is shared. Data storage is built in NFS on /srv services.

As the containers were going to be spread across nodes, they had to be attached to a shared network that would work on multiple servers. Docker handles this with Overlay networks.

We created our Overlay network:
[tty@node-01]$ sudo docker network create --driver overlay
We then created our first service tuleap_bearded_db:
[tty@node-01]$ sudo docker service create \
	--mount type=bind,src=/srv/services/tuleap_foo1_db/data,dst=/var/lib/mysql \
	--network tuleap \
	--name tuleap_bearded_db \
	${MY_PRIVATE_HUB}/tuleap-db:1.1-r2
In this example, the following flags indicate:
  • --mount our mount point
  • --network the name of the network to which to attach
  • --name gives a name to the service
We then checked that our service had been created correctly:
[tty@node-01]$ sudo docker service ls
ID            NAME                MODE        REPLICAS  IMAGE
6cslbc5s4xet  tuleap_bearded_db   replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-db:1.1-r2
We repeated this step with tuleap-web, tuleap-cache, and tuleap-rt:
[tty@node-01]$ sudo docker service ls
ID            NAME                 MODE        REPLICAS  IMAGE
6cslbc5s4xet  tuleap_bearded_db    replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-db:1.1-r2
0rh1lgtvw9d7  tuleap_bearded_web   replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-web:9.4-r0
v09ssuwyleky  tuleap_bearded_rt    replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-rt:1.0-r1
uqs9glq2xyh4  tuleap_bearded_cache replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-cache:1.1-r0

Our Tuleap stack was operational at this stage, but not accessible from the Internet. This was to be expected, as we did not have an exposed port. The reverse-proxy handles this. We decided to use HAProxy rather than Traefik or Nginx. Traefik is a good product, but HAProxy is more accurate in terms of speed and the ability to handle a heavy load. Traefik is also still young, so we will continue to keep a close eye on it. For Nginx, the decision was easy, as we had never used it for a Load-Balancer profile.

For the reverse-proxy we used other options to publish the ports:
[tty@node-01]$ sudo docker service create \
	--network tuleap \
	--name reverseproxy \
	--mode global \
	--publish mode=host,target=80,published=80 \
	--publish mode=host,target=443,published=443 \
	${MY_PRIVATE_HUB}/reverseproxy:1.3-r1
The following flags indicate:
  • --mode the global mode allows the service to deploy on all nodes
  • --publish mode=host publishes the service's port directly on the node and bypasses the Docker gateway. This was a benefit in our case because we would not be receiving the Docker gateway IP, but rather the actual client IP, in the Header X-Forwarded-For.
[tty@node-01]$ sudo docker service ls
ID            NAME                 MODE        REPLICAS  IMAGE
6cslbc5s4xet  tuleap_bearded_db    replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-db:1.1-r2
0rh1lgtvw9d7  tuleap_bearded_web   replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-web:9.4-r0
v09ssuwyleky  tuleap_bearded_rt    replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-rt:1.0-r1
uqs9glq2xyh4  tuleap_bearded_cache replicated  1/1       ${MY_PRIVATE_HUB}/tuleap-cache:1.1-r0
krx27j2ohp36  reverseproxy         global      4/4       ${MY_PRIVATE_HUB}/reverseproxy:1.3-r1
Again, the stack was operational at this stage, but we still had to validate it by testing the service on all nodes:
[tty@alpine]$ NODES_IPS=('${PUB_IP_NODE_1}' '${PUB_IP_NODE_2}' '${PUB_IP_NODE_3}')
[tty@alpine]$ while true; do
> for ip in ${NODES_IPS[*]}; do
> printf "[${ip}] http code: "
> curl -s \
>      -o /dev/null \
>      -w '%{http_code}' \
>      -m 5 \
>      -I \
>      -H "Host: bearded.example.com" \
>      https://${ip}
> printf "\n"
> done
> done
[${PUB_IP_NODE_1}] http code: 200
[${PUB_IP_NODE_2}] http code: 200
[${PUB_IP_NODE_3}] http code: 200
[${PUB_IP_NODE_1}] http code: 200
[${PUB_IP_NODE_2}] http code: 200
[${PUB_IP_NODE_3}] http code: 200
[${PUB_IP_NODE_1}] http code: 200
^C
It ran on all nodes.

To add a new stack, we simply spawned it and restarted the reverse-proxy services one by one:
docker service update --force --update-parallelism 1 --update-delay 5s reverseproxy

At this stage, we had a scalable cluster, which means we could add nodes to the cluster and Tuleap stacks without stopping the service. Therefore, using Docker with Swarm gave us a high-uptime architecture.

Docker Swarm made our initial foray into distributed services easy. We encountered a lot of bugs with the early (1.12) version, but this improved with version 1.13. It is hard to say whether it would have been “better” with Kubernetes. Given the maturity of K8S, there may have been fewer bugs, but given our past experience with Docker, the learning curve would have been steeper in our case. Much of our work involved making our application and release management aware of “clusters of containers.” We had to replicate and test several cluster configurations to do this. This is where Swarm, with its simple setup, really delivered major benefits.

Install Tuleap on a Docker container

Share this post

Leave a comment

To prevent automated submissions please leave this field empty.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.