WebSocket — 1 Million Connections using Appwrite

Published in

ITNEXT

8 min readNov 15, 2021

In our recent blog post we talked about our learnings from building a WebSocket server. Now we want to reap the fruits of our efforts and check how our server performs under tough conditions.

Lessons learned from building a WebSocket server

Appwrite is an open-source, self-hosted Backend-as-a-Service that aims to make app development easier with SDKs…

itnext.io

What’s Appwrite?

Appwrite is an open-source, self-hosted Backend-as-a-Service that aims to make app development easier with SDKs available in a variety of programming languages.

The WebSocket Server used in Appwrite powers a very flexible Realtime API to supercharge your apps with all services of Appwrite! That way, you get notified about an updated document, a user creating a new session, or a function execution being completed. All subscriptions are secured by Appwrite’s authentication and permissions system, meaning a user will only receive updates to resources to which he has read permission.

Our goal was to hit 1 Million concurrent connections.

Setup

For the WebSocket server, we used a General Purpose Droplet from DigitalOcean with 32 cores and 128 gigabytes of memory. For the clients we used 16 smaller Droplets with each 4 cores and 16 gigabytes of memory.

To avoid any outside interference, all Droplets were created in the same data center and the benchmark was run inside an internal isolated network (VPC) — so no traffic went outside.

For the actual benchmark we have used Tsung, an open-source multi-protocol distributed load testing tool to simulate users in order to test the scalability and performance of IP based client/server applications. The fact that we can easily distribute the load on a cluster of client machines makes this tool extremely powerful.

processone/tsung: Tsung is a high-performance benchmark framework for various protocols.

This document gives pointers for information on this package which is distributed under the GNU General Public License…

github.com

In the following sections we are going to reference the client machines as the Client and the Server running Appwrite as the Manager.

Configuration

In order for Tsung to communicate with all its client over SSH, we needed to do the following steps on each machine that will act as a Client for the benchmark:

Add a SSH key.

Set a specific hostname:

realtime-worker-1
realtime-worker-2
…
realtime-worker-16

Install Tsung.

The SSH key added will be later used on the machine running Appwrite, which will also use Tsung in order to perform the benchmark and orchestrate all the Clients. Note, that this machine will not establish any connections to the Realtime server.

Since Tsung needs a hostname for each Client in the configuration instead of an IP, we have added all the machines to /etc/hosts with its hostname and their associated IP address from the internal network.

10.114.0.x realtime-worker-1
10.114.0.x realtime-worker-2
….
10.114.0.x realtime-worker-15
10.114.0.x realtime-worker-16

After all clients have been set up, we manually connect to each Client from the Manager to add them to the known_hosts file. This was required, since Tsung will fail to connect otherwise.

Now we adapted our configuration file for Tsung and added all of the Clients.

<?xml version="1.0"?><tsung loglevel="debug" version="1.0">
  <clients>
    <client host="realtime-worker-1" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-2" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-3" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-4" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-5" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-6" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-7" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-8" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-9" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-10" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-11" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-12" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-13" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-14" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-15" cpu="4" use_controller_vm="false" maxusers="64000"/>
    <client host="realtime-worker-16" cpu="4" use_controller_vm="false" maxusers="64000"/>
   </clients><servers>
    <server host="10.114.0.x" port="9505" type="tcp"/>
  </servers><load>
    <arrivalphase phase="1" duration="10000" unit="second">
      <users maxnumber="2000000" arrivalrate="2000" unit="second"/>
    </arrivalphase>
  </load><options>
    <option name="ports_range" min="1025" max="65535"/>
  </options><sessions>
    <session name="websocket" probability="100" type="ts_websocket">
      <request>
        <websocket type="connect" path="/v1/realtime?project=console&amp;channels[]=files"/>
      </request>
      <for var="i" from="1" to="1000" incr="1">
        <thinktime value="30"/>
      </for>
    </session>
  </sessions>
</tsung>

As you can see inside the <load> tag, the maxnumber attribute is set to 2 Million, the reason for this is that we set a limit on the Realtime Server for 1 million connections, which we didn’t wanna bottleneck.

At that time we were under the impression, based on a GitHub issue, that Swoole our PHP framework only allows a maximum number of 1 million connections per instance. Afterwards we were informed in the Swoole Community Slack, that this is in fact not true and the real limit is 2,147,483,584 on a 64 bit machine.

The only action left to configure is the Operating System itself. Linux’s networking stack comes with sane defaults for many workloads, but the stack isn’t tuned for 1+ million concurrent connections.

We expected to face some form of the C10k problem, so we prepared our systems in advance:

Increased the default TCP buffer sizes for the system
Increased the default IPv4 port range
Increased the limit for open files and file handles
and more

These configurations allow us to do roughly 64,000 connection per Client and brought us all the way to 1 million:

# increasing maximum number of open files
ulimit -n 4000000
sysctl -w fs.file-max=12000500 
sysctl -w fs.nr_open=20000500
echo “root soft nofile 4000000” >> /etc/security/limits.conf
echo “root hard nofile 4000000” >> /etc/security/limits.conf
echo 20000500 > /proc/sys/fs/nr_open# increasing size of the TCP socket buffer.
sysctl -w net.ipv4.tcp_mem=”10000000 10000000 10000000"
sysctl -w net.ipv4.tcp_rmem=”1024 4096 16384"
sysctl -w net.ipv4.tcp_wmem=”1024 4096 16384"
sysctl -w net.core.rmem_max=16384
sysctl -w net.core.wmem_max=16384# disable TCP receive buffer auto-tuning
sysctl -w net.ipv4.tcp_moderate_rcvbuf=”0"# increasing local port range that is used by TCP and UDP
sysctl -w net.ipv4.ip_local_port_range=”500 65535"

Resources we used to tackle these limitation:

Preparation

Since all of the mentioned steps need to be done on all of the Droplets we use, doing all those steps by hand would take way too long. To automate this task we used Ansible and created a playbook that takes care of installing all necessary tools we used, and the configurations for the OS.

- name: apt update and upgrade
  hosts: all
  tasks:
  - name: Update and upgrade apt packages
    apt:
      upgrade: yes
      update_cache: yes
      cache_valid_time: 86400- name: Run tsung setup
    ansible.builtin.script: ./setup.sh- name: Setup manager server
  hosts: manager
  tasks:- name: Install Docker
    ansible.builtin.script: ./get-docker.sh- name: Install Docker Compose
    get_url: 
      url : "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64"
      dest: /usr/local/bin/docker-compose
      mode: 'a+x'
      force: yes- name: Copy private key
    ansible.builtin.copy:
      src: ./id_rsa
      dest: /root/.ssh/id_rsa
      owner: root
      group: root
      mode: '0600'- name: Copy tsung config
    ansible.builtin.copy:
      src: ./bench.xml
      dest: /root/bench.xml
      owner: root
      group: root
      mode: '0644'- name: Copy etchosts
    ansible.builtin.copy:
      src: ./etchosts
      dest: /root/etchosts
      owner: root
      group: root
      mode: '0600'- name: Append etchosts to /etc/hosts
    ansible.builtin.shell: cat /root/etchosts >> /etc/hosts- name: Clone Appwrite repo
    ansible.builtin.git:
      repo: https://github.com/appwrite/appwrite.git
      dest: /root/appwrite- name: Copy sshkeys to workers
  hosts: workers
  tasks:
    - name: add manager public key to workers
      authorized_key:
        user: root
        state: present
        key: "{{ lookup('file', './id_rsa.pub') }}"

In the first section we upgraded to the latest version of all packages, installed Tsung and tweaked the OS with the setup.sh script on all Droplets.

Installing Tsung required us to compile it and install dependencies, for this we used following script:

# install dependencies
apt-get install elixir build-essential gnuplot libtemplate-perl make erlang erlang-base libyaml-dev python python3-pip libssl-dev autoconf# download and compile tsung
wget http://tsung.erlang-projects.org/dist/tsung-1.7.0.tar.gz
tar -xvf tsung-1.7.0.tar.gz
cd tsung-1.7.0/
./configure
make
make install

The second step is preparing the Manager and does following actions:

Install Docker
Install Docker Compose
Copy private SSH key
Copy configuration file for Tsung
Add all Clients to the /etc/hosts file
Install Appwrite

Third and last step is to copy the public SSH key to all Clients, so the Manager is able to connect.

After Ansible had processed all the tasks, we could start the actual benchmark.

The Benchmark

Looking at htop from the Manager with 32 cores and 128 GB of memory ready to be loaded with work is already a very nice sight. Then we started Tsung with following command:

tsung -f bench.xml -k start

Once the benchmark was running, we were offered a much more exciting htop compilation of all 16 Client.

Tsung itself offers a public website on port 8091 which provided us an easily accessible monitoring of the benchmark itself. Sadly the page was not updating in realtime, so we found ourselves refreshing the page every other second in excitement seeing the number of concurrent users connected rise.

Since we configured Tsung to establish 2000 connections every second it would take us exactly 500 seconds to reach the magic million.

After roughly 500 seconds we had done it. We had 1 Million concurrent connections to a single Realtime Server connected!

We were more than satisfied with this number, especially since in production no single real-time instance should hold that many connections anyway. Especially the load of the Manager made us happy, just 16 GB of memory and 7% CPU load was used to maintain the established connections.

However, this benchmark only tests the number of concurrent users connected and not a real world scenario with many different channels and messages sent. We do have some ideas to benchmark different scenarios in the future that reflect typical use-cases and with 2 Million connections.

Results

Since Appwrite is Open Source and we want to provide as much transparency as possible, we offer all results, configuration and scripts for download. We have also prepared a web page where all results are easily accessible.

Appwrite - 1 Million Benchmark

Number of simultaneous users connected over time. Users Number of simultaneous users. Connected Number of users with an…

realtime-1-million-auig4.ondigitalocean.app

You can also find all files in this GitHub repository.