How Snapshots Saved My Time Machine Backups

Until last year I kept my Time Machine backups on a USB drive next to my computer. And although everything worked fine, I didn’t feel comfortable with so much data stored on a single disk. Hence, during summer 2020, I bought myself a DiskStation DS1520+ to put my Time Machine backups on a much more secure and reliable solution. The DS1520+ supports a RAID. Consequently, my data would not be lost caused by a single disk error. Synology has excellent documentation, how you can enable Time Machine backups to a NAS over SMB.

And everything ran very smoothly until I upgraded to macOS Monterey. Afterward, the Time Machine backups gave me some headaches. Quite often, macOS told me it can’t perform any new backup. The error message was:

Time Machine detected that your backups on "mynas.local" cannot be reliably restored. Time Machine must erase your existing backup history and start a new backup to correct this. Button "Remind me Tomorrow", Button "Erase Backup History"

This wasn’t an issue with the hard disks; the disks were ok. But for some reason, the Time Machine Backups went corrupt. Fortunately, I remembered that I enabled Btrfs snapshots for most folders on my NAS. This allowed me to go back to a time when the Time Machine backup was still ok. In the end, I only lost the incremental backups of a single day.

Btrfs snapshots are a life-saver. I have snapshots enabled for all my shared folders, which contain important files. e.g.

  • Photos
  • Videos
  • Personal Files
  • Time Machine Backups

Snapshots are quick to take, easy to restore, and pretty lightweight. For my 1,800 GB Time Machine backups, I have snapshots of less than 40 GB. (~2.2 %). And that is why you should enable them as soon as possible for your valuable files.

Enabling Btrfs Snapshots

  1. Open the Synology Disk Station Manager, and check in the Package Center under Installed that the package Snapshot Replication is available. Please install this package before proceeding.
  2. Open Snapshot Replication, select Snapshots and the Shared Folder you want to configure the snapshots for.
    Open the window of the Snapshot Application, with Snapshots selected in the sidebar, and the shared folder for which snapshots should be activated selected.
  3. Select Settings and configure the settings for schedule, retention, and advanced.
    The setting for the snapshot schedule depends on how often your data changes and what is your accepted amount of data loss. For most of my shared folders, it is more than enough to take a snapshot every 24 hours.

    The setting for the snapshot schedule depends on how often your data changes and what is your accepted amount of data loss. For most of my shared folders, it is more than enough to take a snapshot every 24 hours.

    I set the number of latest snapshots to keep to the maximum. If you  don’t set up a retention policy, snapshots will stop once you have reached the maximum of 1024.

    I set the number of latest snapshots to keep to the maximum. If you don’t set up a retention policy, snapshots will stop once you have reached the maximum of 1024.

    I use GMT time zone to name the snapshots.

    I use GMT time zone to name the snapshots.

  4. Click OK to complete the configuration.

Restoring from a Btrfs Snapshot

  1. Open Snapshot Replication, select Recovery and the Shared Folder you want to restore.
    Open the window of the Snapshot Application, with Recovery selected in the sidebar, and the shared folder which  you want to rollback to a snapshot.
  2. Open Recover and select a snapshot when everything is ok. The more recent the snapshot, the smaller the data loss.
    After clicking on Recover, a list of available snapshots of the selected shared folder is displayed.
  3. Open the menu Action and select Restore to this Snapshot.
  4. In the dialog select, Take a snapshot before restoring to create a snapshot of the current hard disk so that you can always restore to a point in time before the recovery takes place.
    Dialog box with the option "Take a snapshot before restoring" enabled.

Encrypt Service Traffic with OpenShift CA

Did you know that you can make use of the OpenShift CA to encrypt traffic between services or between a route and the service? I know you think service mesh, but there is also a small-scale solution available if you don’t have a service mesh you can use.

Encrypt Ingress Traffic

So let’s say that you want to deploy a web app, and you need to encrypt the traffic between HAProxy as the route entry point and a web app. Usually, the OpenShift operations team has you covered, and they have configured HAProxy with TLS, so you don’t have to worry about public TLS. But once you are past the route as an entry point to OpenShift, you will get HTTP without TLS.

Default setup with not encrypted cluster internal communication

So let’s see what we can do with standard OpenShift to encrypt the traffic between the route and webapp.

Every OpenShift Container Platform cluster has a service-ca operator. The service-ca operator hosts a self-signed certificate authority, from which you can request private keys and certificates. You can use those to encrypt traffic internal to OpenShift.

To request a certificate and private key for our web application, you need to add the annotation service.beta.openshift.io/serving-cert-secret-name: <secret name> to the service resource:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.openshift.io/serving-cert-secret-name: webapp-tls
  name: webapp
spec:
  selector:
    app: webapp
    # ..

The operator will pick up the annotation and store our service’s TLS private key and certificate in secret webapp-tls. To use the certificate and private key in our pod, we need to mount the secret, and import it into the web application. Please check what you need to do because the steps required depend on your web applications tech stack.

After we have configured the pod, we need to change the route settings. The default configuration of the route is insecure. To encrypt the communication between the route and the pod, we need to set TLS termination to reencrypt.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: webapp
  labels:
    app: webapp
spec:
  to:
    kind: Service
    name: webapp
    weight: 100
  port:
    targetPort: https
  tls:
    termination: reencrypt
  wildcardPolicy: None

With that, we now have two spheres of trust.

  1. The external traffic is encrypted with a certificate issued by a public CA, which the browser trusts.
  2. The internal traffic is encrypted with a certificate issued by the OpenShift internal CA, which the route trusts.
Certificate Chains

You can find a working sample in my GitHub Repository blog-coding-samples .

Certificate Validity

The internal OpenShift CA has a validity of 26 months, and after 13 months, the CA is automatically rotated. To smoothen the CA rollover, the new CA maintains trust in the previous CA. In my experience, the transition period is more than enough so that all pods were restarted and consequently updated their certificates.

Conclusion

The best practice for larger deployments is, in most cases, a service mesh. In this scenario, the sidecar proxy is responsible for encrypting and decrypting the communication. As a result, your application does not have to import a certificate and private keys, making a developer’s life a little easier.

However, internal OpenShift CA provides an entry to encrypting and decrypting the cluster-internal communication for small-scale solutions. OpenShift hides much of the burden to manage your own CA as far as possible. However, please keep in mind that this approach is rather suitable for a small-scale setup.

Friends don't let friends run containers as root

Introduction

In my daily work, I encounter different container engines tools during the development process. For the development of the container image, I use a local container engine like Docker for Desktop or Podman. After developing on my local machine, the image is staged from the development environment, over test stages to production. In those environments, Kubernetes or Red Hat Openshift is the container platform of choice.

Unfortunately, the platforms use different approaches with which user ID the container process is started. This can be incredibly frustrating if you start with an official Docker image that just won’t run in Openshift due to stricter security policies.

Let’s look at what rules you should follow for each platform and what I do to consider the quirks of all container engines.

Docker for Desktop / Podman

The desktop tools Docker for Desktop and Podman have pretty loose rules when you run a containerized application. For example, if you don’t do anything, your application is executed as root.

And of course, running an application as root is not recommended, and consequently, many projects use the following approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FROM alpine:3.14

RUN addgroup -S -g 3000 groupFoo
# Setup application user
RUN adduser -S -u 3000 -G groupFoo johndoe

# Copy your app to the image and change file owner to johndoe
COPY --chown 3000:3000 app/foo /app/foo
# Switch to executing user.
USER 3000
ENTRYPOINT [ "/app/foo/docker-entrypoint.sh" ] 

The advantage of this approach is that you cannot forget to switch to a non-root user when you start the container.

The obvious drawback is that you hardcoded the user id and group id into the image.

For example Grafana, Prometheus, and Cassandra use this approach.

Kubernetes

Kubernetes extends the capabilities of a container runtime like Docker with additional orchestration capabilities. But it does not add any security policies. Consequently, you could build an image with the previous approach and start a pod using that image.

But Kubernetes introduces the concept of PodSecurityContext. With PodSecurityContext, you can configure the user who runs the container process.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-context
  labels:
    app: user-context
spec:
  replicas: 1
  selector:
    matchLabels:
      app: user-context
  template:
    metadata:
      labels:
        app: user-context
    spec:
      securityContext:
        runAsUser: 3000
        runAsGroup: 3000
      containers:
      #...

This Deployment tells Kubernetes to run all containers in this pod as user ID 3000 and group ID 3000. (SecurityContext allows to do the same thing but applies to a single container only.)

If we combine, building the user id and group id into the container image and PodSecurityContext, we need to ensure that both configure the same user ID and group ID. If we don’t do that, the file permissions do not allow us to access the application files.

Openshift

Openshift adds two additional security layers to what we have seen so far.

Firstly, Openshift runs all pods using an arbitrarily assigned user ID. The reason behind this is additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node. Please refer to section Support arbitrary user IDs of the Openshift documentation. The randomization of user IDs renders the first approach building the user ID and group ID into the image useless.

You might think that you can use PodSecurityContext to tell Openshift which user ID to take. And yes, this is working to some degree. But now comes the second security layer. Openshift restricts with Security Context Constraints the range of allowed user IDs. And this range can be different for every environment. Hence, we have to assume the user ID is random.

Luckily, the randomization is limited to the user ID. This means that whatever Openshifts selects as user ID, the user is always a member of the root group (group ID 0). So if we make sure that all files are accessible by the root group, Openshift can run the container process without any problems.

Conclusion

I found the following points very helpful to ensure that a workload never runs as root:

  • Docker / Podman
    • Set the user and group ID of the user running the container process with the USER instruction. E.g., USER 3000:0. The user ID you select can be any number equal or greater than 1000. This is needed to ensure that the container is not accidentally run as root or use a well-known user ID.
    • Do not use user ID 65535, which is the nobody user. This user is reserved for nfs.
    • Do not use adduser to create the user in the container OS. adduser might not be available in the image. Further on, the USER directive is more evident than reading the content of all RUN directives.
    • Make sure that all relevant files are accessible by group ID 0.
  • Kubernetes
    • For running a plain container in Kubernetes, you don’t have to do anything. Kubernetes starts the container with the user ID set by the USER instruction in the Dockerfile.
  • Helm Charts
    • If you need to set the PodSecurityContext, make this value configurable.
  • Openshift
    • After following the previous steps, the container is up and running without any issues.