Jose Truyol
Jose Truyol Electronics Engineer working as Software Engineer

Automating Azure SQL Database backups using docker and GitLab

Automating Azure SQL Database backups using docker and GitLab
Photo by Markus Winkler on Unsplash

Everybody wants repeatable tests once in a while, and these tests should use external dependencies with predictable behaviors and values. In this case, our team at Indimin wanted to test our product’s microservices APIs with up-to-date database schema and test data. This data is the same that we use in our QA environment, hosted on Azure as an SQL Database Instance, so we use this database as our inner test database.

I work on ubuntu, so the code snippets refer to a bash terminal.

Requirements

  • Backup database schema and data (only one database)
  • These backups should live inside a container (only use Microsoft’s SQL Server 2019 base image).
  • We need to continuously make backups, hopefully without human intervention.
  • Using our self-managed GitLab instance, there should be a simple-to-use place where we can schedule different database backups on low traffic hours without modifying the automation process.

Let’s address these requirements one by one.

Backup and restore

We need a way to backup schema and data from an Azure SQL Database to our local file system. Luckily, Microsoft offers a tool for this. The sqlpackage allows us to export and import database schema and data using the bacpac file format.

To backup using a file as a target, we run:

1
2
3
4
5
6
sqlpackage /a:Export \
  /ssn:tcp:server_name.database.windows.net,1433 \
  /su:sa \
  /sp:str()ng!password \
  /sdn:database_name \
  /tf:backup/backup.bacpac

This generates the following directory structure:

1
2
3
4
5
6
7
8
9
10
11
12
$ ls -lh
drwxr-xr-x   3 root root 4.0K May 12 16:44 IsolatedStorage
-rw-r--r--   1 root root 2.5M May 12 16:44 backup.bacpac

$ tree . | head
|-- IsolatedStorage
|   `-- ulyru5ls.gn3
|       `-- i0y5vzn2.ptb
|           `-- StrongName.pdsauadni2w1hwlsyjmz1pizhzdilmes
|               `-- StrongName.pdsauadni2w1hwlsyjmz1pizhzdilmes
|                   `-- Files
|-- backup.bacpac

So, my recommendation is to always use an empty directory as the backup output.

This restores the backup to a new database:

1
2
3
4
5
6
7
sqlpackage \
  /a:Import \
  /tsn:tcp:server_name.database.windows.net,1433 \
  /tu:sa \
  /tp:str()ng!password \
  /tdn:restored_database \
  /sf:out/backup.bacpac

Docker container

Now that we have the backup file, it’s time to create a container where it will be restored at startup. This container should be based on the official MSSQL 2019 image

Let’s find out if this image has anything we need for now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ docker run -it --rm -u root mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04 bash
SQL Server 2019 will run as non-root by default.
This container is running as user root.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
root@96a8b1736184:/# sqlpackage
bash: sqlpackage: command not found
root@96a8b1736184:/# apt-get -qq update && apt-get -qq install tree -y
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package tree.
(Reading database ... 9990 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-3_amd64.deb ...
Unpacking tree (1.7.0-3) ...
Setting up tree (1.7.0-3) ...
root@96a8b1736184:/# cd /opt/
root@96a8b1736184:/opt# tree . | grep sqlpackage

Uhm, no sqlpackage binary in the official image, and according to this issue there is no plan to add it soon. The only way is to create a base image on top of it. Let’s do it!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
FROM mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04

LABEL mantainer="Jose Truyol"

USER root

RUN apt-get update \
  && apt-get install -y unzip \
  && wget -q -O sqlpackage.zip https://go.microsoft.com/fwlink/?linkid=2134311 \
  && unzip -qq sqlpackage.zip -d /opt/sqlpackage \
  && chmod +x /opt/sqlpackage/sqlpackage \
  && apt-get remove --purge unzip -y \
  && rm sqlpackage.zip

ENV PATH=$PATH:/opt/mssql-tools/bin:/opt/sqlpackage
ENV ACCEPT_EULA=Y
ENV MSSQL_PID=Express

USER mssql

Using this image, let’s test sqlpackage

We need to use the root user because the mssql user has no permission to create files or directories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ docker run -it -u root --rm sql.backup bash
SQL Server 2019 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
root@7d27d4b34121:/$ sqlpackage
SqlPackage: Command-line tool for creating and deploying SQL Server databases and DACPAC packages.
Copyright (c) 2016 Microsoft Corporation.  All rights reserved.

Help for dynamic property usage.
/@<file>:<string>
     Read response file for more options.

/help:[True|False]
     (short form /?)
.....

root@7d27d4b34121:/# mkdir backup
root@7d27d4b34121:/# sqlpackage /a:Export \
>   /ssn:tcp:server_name.database.windows.net,1433 \
>   /su:sa \
>   /sp:"str()ng!password" \
>   /sdn:database_name \
>   /tf:backup/backup.bacpac
Connecting to database 'database_name' on server 'tcp:server_name.database.windows.net,1433'.
Extracting schema
Extracting schema from database
Resolving references in schema model
Validating schema model
Validating schema model for data package
Validating schema
Exporting data from database
Exporting data
Processing Export.
Processing Tables
Successfully exported database and saved it to file '/backup/backup.bacpac'.
Time elapsed 00:00:47.86
root@7d27d4b34121:/#

Now that we have a way to do backups using a docker container it’s time to restore it to a new server. In this case, the new server will be the same container where we do our backup.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
FROM mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04

LABEL mantainer="Jose Truyol <[email protected]>"

USER root

RUN apt-get update \
  && apt-get install -y unzip \
  && wget -q -O sqlpackage.zip https://go.microsoft.com/fwlink/?linkid=2134311 \
  && unzip -qq sqlpackage.zip -d /opt/sqlpackage \
  && chmod +x /opt/sqlpackage/sqlpackage \
  && apt-get remove --purge unzip -y \
  && rm sqlpackage.zip


ENV PATH=$PATH:/opt/mssql-tools/bin:/opt/sqlpackage
ENV ACCEPT_EULA=Y
ENV MSSQL_PID=Express
ENV SA_PASSWORD="!test.Passw0rd"

COPY ./scripts/ /scripts

RUN bash /scripts/backup.sh

USER mssql

CMD [ "/scripts/entrypoint.sh" ]

The backup.sh file content is:

1
2
3
4
5
6
7
8
#!/bin/bash

 sqlpackage /a:Export \
  /ssn:tcp:server_name.database.windows.net,1433 \
  /su:sa \
  /sp:"str()ng!password" \
  /sdn:database_name \
  /tf:backup/backup.bacpac

The default CMD is replaced with the entrypoint.sh file:

1
2
3
#!/bin/bash

/scripts/restore.sh  & /opt/mssql/bin/sqlservr

There should be a non-terminating process to keep the container alive. Based on this issue

The /opt/mssql/bin/sqlservr part was found using:

1
2
3
4
$ docker inspect mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04 | jq -r '.[0].Config.Cmd'
[
  "/opt/mssql/bin/sqlservr"
]`

The restore.sh file contains:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash

# WAITING LOGIC BASED ON https://github.com/microsoft/mssql-docker/blob/master/linux/preview/examples/mssql-customize/configure-db.sh

DBSTATUS=1
ERRCODE=1
i=0
MAX_ITERATIONS=60

while [[ $DBSTATUS -ne 0 ]] && [[ $i -lt $MAX_ITERATIONS ]] || [[ $ERRCODE -ne 0 ]]; do
	i=$i+1
	DBSTATUS=$(/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P $SA_PASSWORD -Q "SET NOCOUNT ON; SELECT  SUM(state) FROM sys.databases")
	ERRCODE=$?
	sleep 1s
done

if [[ $DBSTATUS -ne 0 ]] || [[ $ERRCODE -ne 0 ]]; then
	echo "SQL Server took more than $MAX_ITERATIONS seconds to start up or one or more databases are not in an ONLINE state"
	exit 1
fi

sqlpackage \
  /a:Import \
  /tsn:localhost \
  /tdn:restored_db \
  /tu:sa \
  /tp:$SA_PASSWORD \
  /sf:backup/backup.bacpac

Before any restore attempts, the script will execute a query against the database. While this command returns any error we can assume that the database isn’t ready. After 60 iterations, if there is still any error the script exit with an error. This allows us to restore the database as soon as it’s ready and not at an arbitrary waiting time using any sleep command.

Let’s build and run this container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
$ docker build -t sql.backup .
Sending build context to Docker daemon  8.192kB
Step 1/13 : FROM mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04

......

Successfully tagged sql.backup:latest
$ docker run -it --rm sql.backup
SQL Server 2019 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
2021-05-12 21:01:18.15 Server      The licensing PID was successfully processed. The new edition is [Express Edition].

......

*** Error importing database:Could not import package.
Error SQL72014: Core Microsoft SqlClient Data Provider: Msg 12824, Level 16, State 1, Line 5 The sp_configure value 'contained database authentication' must be set to 1 in order to alter a contained database.  You may need to use RECONFIGURE to set the value_in_use.
Error SQL72045: Script execution error.  The executed script:
IF EXISTS (SELECT 1
           FROM   [master].[dbo].[sysdatabases]
           WHERE  [name] = N'$(DatabaseName)')
    BEGIN
        ALTER DATABASE [$(DatabaseName)]
            SET FILESTREAM(NON_TRANSACTED_ACCESS = OFF),
                CONTAINMENT = PARTIAL
            WITH ROLLBACK IMMEDIATE;
    END


Error SQL72014: Core Microsoft SqlClient Data Provider: Msg 5069, Level 16, State 1, Line 5 ALTER DATABASE statement failed.
Error SQL72045: Script execution error.  The executed script:
IF EXISTS (SELECT 1
           FROM   [master].[dbo].[sysdatabases]
           WHERE  [name] = N'$(DatabaseName)')
    BEGIN
        ALTER DATABASE [$(DatabaseName)]
            SET FILESTREAM(NON_TRANSACTED_ACCESS = OFF),
                CONTAINMENT = PARTIAL
            WITH ROLLBACK IMMEDIATE;
    END

Based on the documentation we can fix this with a simple SQL query before any database restore. The modified restore.sh is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash

# WAITING LOGIC BASED ON https://github.com/microsoft/mssql-docker/blob/master/linux/preview/examples/mssql-customize/configure-db.sh

DBSTATUS=1
ERRCODE=1
i=0
MAX_ITERATIONS=60

while [[ $DBSTATUS -ne 0 ]] && [[ $i -lt $MAX_ITERATIONS ]] || [[ $ERRCODE -ne 0 ]]; do
	i=$i+1
	DBSTATUS=$(/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P $SA_PASSWORD -Q "SET NOCOUNT ON; SELECT SUM(state) FROM sys.databases")
	ERRCODE=$?
	sleep 1s
done

if [[ $DBSTATUS -ne 0 ]] || [[ $ERRCODE -ne 0 ]]; then
	echo "SQL Server took more than $MAX_ITERATIONS seconds to start up or one or more databases are not in an ONLINE state"
	exit 1
fi

sqlcmd -S localhost -P $SA_PASSWORD -U sa -Q "$(cat <<-EOF
sp_configure 'contained database authentication', 1;
GO
RECONFIGURE;
GO
EOF
)"

sqlpackage \
  /a:Import \
  /tsn:localhost \
  /tdn:restored_db \
  /tu:sa \
  /tp:$SA_PASSWORD \
  /sf:backup/backup.bacpac

Let’s try it again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ docker build -t sql.backup .
Sending build context to Docker daemon  8.192kB
Step 1/13 : FROM mcr.microsoft.com/mssql/server:2019-CU10-ubuntu-16.04

......

Successfully tagged sql.backup:latest
$ docker run -it --rm -p 1434:1433 sql.backup
SQL Server 2019 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.

......

Successfully imported database.
Time elapsed 00:00:26.18

Excellent! At this point, we have a docker container with our Azure Database backup restored in it.

db_restored

Backup automation

Using our GitLab instance, the go-to way to do this automation is using the CI/CD pipeline feature

I’m used to dividing my pipeline definition between multiple files inside the .gitlab/ci directory. In this case, I need a few files:

1
2
3
4
5
6
7
8
9
10
11
12
$ tree . -a
.
├── Dockerfile
├── .gitlab
│   ├── ci
│   │   ├── backup.gitlab-ci.yml
│   │   └── global.gitlab-ci.yml
│   └── scripts
│       ├── backup.sh
│       ├── entrypoint.sh
│       └── restore.sh
├── .gitlab-ci.yml

Let’s go into each one of them:

.gitlab-ci.yml

1
2
3
4
5
6
7
8
9
10
variables:
  DOCKER_VERSION: 19.03.15
  GIT_SUBMODULE_STRATEGY: none

stages:
  - backup

include:
  - local: .gitlab/ci/global.gitlab-ci.yml
  - local: .gitlab/ci/backup.gitlab-ci.yml

Here are defined global variables, in this case, the dind (Docker in Docker) version to use as well as the submodule strategy.

Next are the stages and the include statements with the other parts of our pipeline definition.

global.gitlab-ci.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
.default-retry:
  retry:
    max: 2 # This is confusing but this means "3 runs at max".
    when:
      - unknown_failure
      - api_failure
      - runner_system_failure
      - job_execution_timeout
      - stuck_or_timeout_failure

.use-docker-in-docker:
  image: docker:${DOCKER_VERSION}-dind
  services:
    - docker:${DOCKER_VERSION}-dind
  variables:
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""

This file contains some little templates that we can use in any job. default-retry allows us to run the job up to 3 times if there any failure unrelated to our files. use-docker-in-docker will allow us to use docker in docker in any job that extends it.

backup.gitlab-ci.yml

1
2
3
4
5
6
7
8
9
10
db_backup_image:
  extends:
    - .default-retry
    - .use-docker-in-docker
    - .backup:rules:db_backup_image
  stage: backup
  script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:${DB_NAME}

This is the main job of the pipeline. It will log in against our docker registry (also GitLab), build our image, and push it.

Scheduling

Gitlab offers Pipeline Scheduling. This will allow us to schedule multiple pipeline execution, with different variable values. So the first step is to prepare our code to be aware of these variables.

The variables are

Name Description
DB_HOST Source database host with the format tcp:<server>,<port>
DB_USER Source database user
DB_PASSWORD Source database password
DB_NAME Source database name
TARGET_DB_NAME Target database name

First, the backup.sh script:

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash

rm -r -f backup/
mkdir -p backup/

sqlpackage /a:Export \
  /ssn:$DB_HOST \
  /su:$DB_USER \
  /sp:$DB_PASSWORD \
  /sdn:$DB_NAME \
  /tf:backup/backup.bacpac

Next, the restore.sh script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash

# WAITING LOGIC BASED ON https://github.com/microsoft/mssql-docker/blob/master/linux/preview/examples/mssql-customize/configure-db.sh

DBSTATUS=1
ERRCODE=1
i=0
MAX_ITERATIONS=60

while [[ $DBSTATUS -ne 0 ]] && [[ $i -lt $MAX_ITERATIONS ]] || [[ $ERRCODE -ne 0 ]]; do
i=$i+1
DBSTATUS=$(/opt/mssql-tools/bin/sqlcmd -h -1 -t 1 -U sa -P $SA_PASSWORD -Q "SET NOCOUNT ON; SELECT SUM(state) FROM sys.databases")
ERRCODE=$?
sleep 1s
done

if [[ $DBSTATUS -ne 0 ]] || [[ $ERRCODE -ne 0 ]]; then
echo "SQL Server took more than $MAX_ITERATIONS seconds to start up or one or more databases are not in an ONLINE state"
exit 1
fi

sqlcmd -S localhost -P $SA_PASSWORD -U sa -Q "$(cat <<-EOF
sp_configure 'contained database authentication', 1;
GO
RECONFIGURE;
GO
EOF
)"

sqlpackage \
/a:Import \
/tsn:localhost \
/tdn:$TARGET_DB_NAME \
/tu:sa \
/tp:$SA_PASSWORD \
/sf:backup/backup.bacpac

To update the Dockerfile we just need to add:

1
2
3
4
5
6
ARG DB_HOST
ARG DB_USER
ARG DB_PASSWORD
ARG DB_NAME
ARG TARGET_DB_NAME
ENV TARGET_DB_NAME=${TARGET_DB_NAME}

These variables need to be passed to the build process, so we modify the backup.gitlab-ci.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
db_backup_image:
  extends:
    - .default-retry
    - .use-docker-in-docker
    - .backup:rules:db_backup_image
  stage: backup
  script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - |
      docker build \
        --build-arg DB_HOST=${DB_HOST} \
        --build-arg DB_USER=${DB_USER} \
        --build-arg DB_PASSWORD=${DB_PASSWORD} \
        --build-arg DB_NAME=${DB_NAME} \
        --build-arg TARGET_DB_NAME=${TARGET_DB_NAME} \
        -t $CI_REGISTRY_IMAGE:${DB_NAME} .
    - docker push $CI_REGISTRY_IMAGE:${DB_NAME}
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

The rule if: $CI_PIPELINE_SOURCE == "schedule" will allow GitLab to run this job only when a pipeline is triggered using the scheduling feature.

Now we can program as many database backups as we need it.

schedule

comments powered by Disqus