Dobrev.EU Blog

Things I want to share

Creating an Elastic Cloud for Running Containers Using Apache Mesos, Marathon, Chronos and Consul

| Comments

Docker containers became very popular in the past decade. A lot of companies created numerous products for Container orchestration – Kubernetes, Docker Swarm and Apache Mesos are worth mentioning as they have the biggest share of customers running containers in their production stack. I’m planning to cover the installation and configuration of Apache Mesos, Marathon, Chronos, Consul, mesos-consul and consul-template.

Environment Description

The goal is to have a cluster of servers that will run containers for us. Following nodes are the base minimum requirement. I do realize that most of this can fit a single box if we run the components in the so called development mode. Still I want to be as close as possible to a real-life solution

  • 2 Nodes for running Apache Mesos master, Zookeeper, Marathon, Chronos, Consul Server and DNS Forwarder
  • 2 Nodes acting as an Apache Mesos Slave, Consul Client and DNS Forwarder. Docker will be installed as well.
  • 1 Node acting as a load-balancer to “expose” our applications to the world

I’m running 5 VMs that I happen to manage with Vagrant. I plan to release my Vagrant work at some point in the future but for the time being let’s get familiar with the basics and manual configuration and installation of every single component.

Installation

Servers

Install CentOS 7 on all servers. For my Vagrant boxes I’m using 768MB RAM and 1 CPU for the masters and the load-balancer and 1280MB RAM and 2 CPUs for the slaves. From now on I’m going to refer to them as master1, master2, slave1, slave2 and lb1. I’m using host-only network for the nodes to simulate isolated VLAN traffic and just bridge my lb1 node to public network.

hosts file definition
1
2
3
4
5
192.168.50.11    mesos-master1 master1
192.168.50.12   mesos-master2 master2
192.168.50.21   mesos-slave1 slave1
192.168.50.22   mesos-slave2 slave2
192.168.50.31   mesos-lb1 lb1

Install on every Mesos master and slave node the Mesosphere repository.

Install Mesosphere repo
1
yum -y localinstall http://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm

Zookeeper

Apache Mesos, Marathon and Chronos depend on Zookeeper to share state, elect cluster leader etc. So we start with it first. Zookeeper can be installed on a separate set of servers but I’m reusing the Mesos Masters for that purpose.

Install Mesosphere repo
1
yum -y install mesosphere-zookeeper

Then using you favourite editor edit /etc/zookeeper/conf/zoo.cfg and add following lines at the bottom.

Zookeeper server replication settings
1
2
server.1=192.168.50.11:2888:3888
server.2=192.168.50.12:2888:3888

Change /var/lib/zookeeper/myid and set an unique positive number between 1 and 255. In my case 1 and 2 will be sufficient for either nodes. You can start Zookeeper service now.

Start Zookeeper
1
2
systemctl enable zookeeper
service zookeeper start

Apache Mesos

On every Mesos master and slave node install Apache Mesos.

Install Apache Mesos
1
yum -y install mesos

Edit /etc/mesos/zk and set the URL of your Zookeeper server cluster. This is a common setting used by masters and slaves.

Set Zookeeper URL for Mesos
1
echo 'zk://192.168.50.11:2181,192.168.50.12:2181/mesos' | tee /etc/mesos/zk

Few more configuration files have to be created on the master servers. Create /etc/mesos-master/quorum and set amount of Mesos servers required to build quorum.

Set Quorum for Mesos
1
echo 1 | tee /etc/mesos-master/quorum

To isolate cluster traffic set the IP to listen to. Don’t forget to replace the IP for every node.

Set listen address for Mesos
1
2
3
4
5
# On Mesos master
echo "192.168.50.11" | tee /etc/mesos-master/ip

# On Mesos slave
echo "192.168.50.21" | tee /etc/mesos-slave/ip

Optionally you can set the name of your Mesos cluster

Set Mesos cluster name
1
echo "MyLocalMesosCluster" | tee /etc/mesos-master/cluster

You can now start you Mesos Cluster

Start Mesos masters
1
2
3
systemctl disable mesos-slave
systemctl enable mesos-master
service mesos-master start
Start Mesos Slaves
1
2
3
systemctl disable mesos-master
systemctl enable mesos-slave
service mesos-slave start

You have to enable/disable services on every Mesos package update. It’s one of the annoying side-effects the post-installer has to enable both services on startup. With this step complete you can try to login to Mesos UI and see if you have your cluster running. You should be able to see that it elected their leader and 2 Slaves have registered.

So far so good but you want to have some Frameworks running on top of Mesos to handle the execution of applications. Marathon and Chronos are very good candidates to start with.

Marathon and Chronos

The masters will run Marathon and Chronos too so install them.

Install Marathon and Chronos
1
yum -y install marathon chronos

All you need to do is enable them on startup and start the services. No additional configuration is required at the moment because both are reading /etc/mesos/zk.

Start Marathon and Chronos
1
2
3
4
systemctl enable marathon
systemctl enable chronos
service marathon start
service chronos start

You can then login to Marathon UI and Chronos UI. Good news – your basic Mesos cluster is now running and ready to run applications. Still something is missing – what if we have thousands of applications in our cluster? How do we find out where to connect to each of them? Lucky there is service discovery to help us.

HashiCorp Consul

HashiCorp Consul is one of the leading Service Discovery software. We install it on both Mesos masters and slaves. I wasn’t able to find RPM package for it so I do it manually.

Download HashiCorp Consul and Consul UI
1
2
wget --quiet https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
wget --quiet https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_web_ui.zip

Then create a local user consul and create the folders the service is going to use.

Add consul user
1
2
3
useradd consul
mkdir -p /etc/consul.d/{server,client} /var/opt/consul
chown consul. /var/opt/consul

Extract both archives

Extract Consul and Consul UI
1
2
3
unzip consul_0.6.4_linux_amd64.zip -d /usr/local/bin
unzip consul_0.6.4_web_ui.zip -d /home/consul
chown consul. /home/consul -R

Try running Consul. You should see the following output.

Try running consul
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[vagrant@mesos-master1 ~]$ consul
usage: consul [--version] [--help] <command> [<args>]

Available commands are:
    agent          Runs a Consul agent
    configtest     Validate config file
    event          Fire a new event
    exec           Executes a command on Consul nodes
    force-leave    Forces a member of the cluster to enter the "left" state
    info           Provides debugging information for operators
    join           Tell Consul agent to join cluster
    keygen         Generates a new encryption key
    keyring        Manages gossip layer encryption keys
    leave          Gracefully leaves the Consul cluster and shuts down
    lock           Execute a command holding a lock
    maint          Controls node or service maintenance mode
    members        Lists the members of a Consul cluster
    monitor        Stream logs from a Consul agent
    reload         Triggers the agent to reload configuration files
    rtt            Estimates network round trip time between nodes
    version        Prints the Consul version
    watch          Watch for changes in Consul

To this point we’ve installed both Consul and Consul UI. It’s time to bundle them together in a nice server and client cluster. It is recommended to encrypt the traffic between your nodes so first generate the key and make a note of it. You’re going to need it later.

Generate secret key for Consul
1
2
[vagrant@mesos-master1 ~]$ consul keygen
q1yPytAQznIuwLyuL8w+tg==

In the steps before I’ve created 2 folders under /etc/consul.d – server and client. I’m using them to separate the configuration for both use-cases of Consul and to allow easier change of purpose by just editing the start-up script for Consul. There might be a better idea to add an option and select the type of agent from a configuration file under /etc/sysconfig/consul but I won’t cover it now.

Create config.json file in /etc/consul.d/server on the Mesos masters and in /etc/consul.d/client for the Mesos slaves.

Create config.json
1
2
3
4
5
6
7
8
9
10
11
{
    "bootstrap": false,
    "server": true,
    "bootstrap_expect": 2,
    "datacenter": "localdev",
    "data_dir": "/var/opt/consul",
    "ui_dir": "/home/consul",
    "encrypt": "q1yPytAQznIuwLyuL8w+tg==",
    "log_level": "INFO",
    "enable_syslog": true
}

Repeat this step for the Mesos Slaves and just change server to false. bootstrap_expect will be ignored on the clients. Add your start-up script – /etc/systemd/system/consul.service. Remember to change the path of the configuration folder to /etc/consul.d/client for the Mesos slaves.

Consul start-up script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[Unit]
Description=consul agent
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/sysconfig/consul
Environment=GOMAXPROCS=2
Restart=on-failure
ExecStart=/usr/local/bin/consul agent $OPTIONS -config-dir /etc/consul.d/server
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGTERM
User=consul
Group=consul

[Install]
WantedBy=multi-user.target

One final bit is missing – /etc/sysconfig/consul. Create one for every server.

Consul start-up script configuration
1
2
3
4
5
6
7
# For master1 it looks like
#OPTIONS="-client 192.168.50.11 -bind 192.168.50.11 -join 192.168.50.12"
OPTIONS="-client <ip_of_the_node> -bind <ip_of_the_node> -join <ip_of_the_other_master>"

# For slave1 it looks like
#OPTIONS="-client 192.168.50.21 -bind 192.168.50.21 -join 192.168.50.11 -join 192.168.50.12"
OPTIONS="-client <ip_of_the_node> -bind <ip_of_the_node> -join <master1_ip> -join <master2_ip>"

You can start Consul now

Start consul
1
2
systemctl enable consul
service consul start

If everything is fine you should be able to login to Consul UI on every node. Remember that 2 Consul Server nodes won’t help you in case of split-brain situation. Think about increasing the amount of nodes to 3 or 5.

We’re nearly done. Two more solutions are required to have our automated LB – mesos-consul and consul-template. On the masters create 3 configuration files for 3 services – zookeeper, marathon and chronos.

marathon.json service configuration
1
{"service": {"name": "marathon", "tags": ["marathon"], "port": 8080, "check": {"script": "curl localhost:8080 >/dev/null 2>&1", "interval": "10s"}}}
chronos.json service configuration
1
{"service": {"name": "chronos", "tags": ["chronos"], "port": 4400, "check": {"script": "curl localhost:4400 >/dev/null 2>&1", "interval": "10s"}}}
zookeeper.json service configuration
1
{"service": {"name": "zookeeper", "tags": ["zookeeper"], "port": 2181}}

Restart your consul servers then have a look at the services in the UI. You should be able to see them listed on each master node.

Check if Consul is listing services properly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[vagrant@mesos-master1 ~]$ dig zookeeper.service.localdev.consul

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> zookeeper.service.localdev.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20585
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;zookeeper.service.localdev.consul. IN A

;; ANSWER SECTION:
zookeeper.service.localdev.consul. 0 IN   A   192.168.50.11
zookeeper.service.localdev.consul. 0 IN   A   192.168.50.12

;; Query time: 3 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jul 08 14:29:20 UTC 2016
;; MSG SIZE  rcvd: 149

As you can see from the previous output, my server is able to resolve services in my cluster. By default Consul offers a DNS service next to the RESTful HTTP API. This service is running on an unprivileged port 8600. To make life easier I’m configuring dnsmasq to forward requests for the consul. domain to my local Consul DNS instance. Create a file in /etc/NetworkManager/dnsmasq.d/consul and restart NetworkManager

Configure DNS Forwarding with dnsmasq
1
2
server=/consul/192.168.50.11#8600
listen-address=127.0.0.1,192.168.50.11

Again don’t forget to change the IP for each node

Restart NetworkManager
1
service NetworkManager restart

On some systems you might need to add dnsmasq support in NetworkManager. Add following line to your NetworkManager.conf file

Enable dnsmasq in NetworkManager
1
2
3
[main]
...
dns=dnsmasq

With Consul running and DNS forwarding enabled on all nodes we just need to configure consul-template and we consider our job done.

consul-template

Again finding a RPM for consul-template was impossible task so I installed it manually.

Download and install consul-template
1
2
wget --quiet https://releases.hashicorp.com/consul-template/0.15.0/consul-template_0.15.0_linux_amd64.zip
unzip consul-template_0.15.0_linux_amd64.zip -d /usr/local/bin

Create the start-up script for it – /etc/systemd/system/consul-template.service

consul-template start-up script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[Unit]
Description=Consul template service
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/sysconfig/consul-template
Environment=GOMAXPROCS=2
Restart=on-failure
ExecStart=/usr/local/bin/consul-template $OPTIONS -config /etc/consul-template/tmpl.json -syslog -log-level=INFO
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGTERM
User=root
Group=root

[Install]
WantedBy=multi-user.target 

and the sysconfig one in /etc/sysconfig

consul-template sysconfig file
1
echo 'OPTIONS=""' | tee /etc/sysconfig/consul-template

Finally we need to tell consul-template what to do with the data it collects from Consul.

tmpl.json configuration file
1
2
3
4
5
6
7
consul = "consul.service.localdev.consul:8500"

template {
  source = "/etc/haproxy/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

The configuration file above has to be saved as /etc/consul-template/tmpl.json. As you can see I’m setting my Consul server location using the DNS forwarder and then instructing consul-template to use /etc/haproxy/haproxy.ctmpl as source to generate the configuration file for HAProxy.

haproxy.ctmpl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# File managed by consul-template. 
# Don't even think about changing the .cfg file
# Have a look at .ctmpl instead

defaults LOCALDEV
    mode tcp
    timeout connect  4s
    timeout server  30s
    timeout client  30s

listen marathon-framework-in
    balance roundrobin
    bind *:8080
        server  :

listen zookeeper-in
    balance roundrobin
    bind *:2181
        server  :

listen admin
    bind *:9090
    mode http
    stats enable
    stats uri /
    stats hide-version

Enable consul-template and start the services.

Enable consul-template and start the services.
1
2
3
4
systemctl enable consul-template
systemctl enable haproxy
service consul-template start
service haproxy start

You’re now able to access Marathon from external IPs using the load-balancer. Feel free to add other services that you might wish to expose to the real world.

It is time to run our first container but we almost forgot to install and configure Docker.

Docker

Install Docker on the slave servers

Install docker
1
yum -y install docker-io

Edit /etc/sysconfig/docker and add following setting to OPTIONS

Docker DNS settings
1
OPTIONS='--selinux-enabled --log-driver=journald --dns 192.168.50.11 --dns-search localdev.consul'

Once again I’m reminding you to change the IP for every server. Then of course restart Docker. Edit /etc/mesos-slave/containerizers on every slave node.

Set containerizers for Mesos Slaves
1
docker,mesos

Restart Mesos on every slave. With that final bit set we’re good to go and test if our cluster is working as expected.

Running your first Docker container

Important bit in this environment is the bridge application between Mesos and Consul. Login to your Marathon UI and create a new application.

mesos-consul.json configuration file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
  "id": "/mesos-consul",
  "cmd": null,
  "cpus": 0.1,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "ciscocloud/mesos-consul",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 0,
          "hostPort": 0,
          "servicePort": 10002,
          "protocol": "tcp",
          "labels": {}
        }
      ],
      "privileged": false,
      "parameters": [],
      "forcePullImage": false
    }
  },
  "portDefinitions": [
    {
      "port": 10002,
      "protocol": "tcp",
      "labels": {}
    }
  ],
  "args": [
    "--zk=zk://zookeeper.service.localdev.consul:2181/mesos",
    "--mesos-ip-order=mesos,host",
    "--consul",
    "--refresh=5s"
  ]
}

If it runs stable then congratulations, you’ve just created your first elastic cloud environment. Check Consul UI – you should see 2 new services – mesos and mesos-consul. The more apps you add to Marathon the more services you’ll see in consul.

Comments