Pulling and Verifying DockerHub Images

During a recent debugging session with a program that uses config.digest field from the JSON manifest file of Docker images as Image SHA to pull from DockerHub (or other registries), I encountered an intriguing error:

Error response from daemon: unknown: manifest schema unsupported

This error highlighted a key aspect of how Docker images are managed, specifically related to their manifests and digests. Upon further investigation, I discovered that the mismatch between the image SHA and config.digest was the root cause. In this blog post, you'll gain an understanding of the manifest file, its role in Docker image management, and how it is associated with the image itself.

Terminologies

Docker Image

A Docker image is a read-only template that contains all the necessary instructions, configurations, and dependencies required to create a Docker container. It includes the application code, runtime environment, libraries, and any other files needed to run an application. When you execute a Docker image, it becomes a running instance known as a container. The image for an application is created by using build context and Dockerfile.

Build Context

In Docker, a build context is the set of files and directories that are used to build a container. It includes source code files, configuration files, and any other files needed to create the final Docker image.

Dockerfile

A Dockerfile is a text file that contains instructions for creating a container image. When the docker build command is executed, the Dockerfile's instructions are processed to create the image layers.

Image Layers

In Docker, an image is constructed from a series of layers, each representing a file-system change. These layers are created by executing instructions in a Dockerfile. Once a layer is created, it cannot be altered. Any modifications result in the creation of a new layer. Layers are stored using a union file-system, allowing multiple images to share common layers, thereby conserving disk space and reducing redundancy. And Docker caches layers to expedite the build process.

Manifest file

Manifests are expressed in JSON and contain information about the Schema Version, image's layers and config. The Docker client uses manifests to work out whether an image is compatible with the device. It then uses the information to determine how to start new containers.

Image SHA

It uniquely identifies docker image based on all its layers and metadata. This digest ensures that the exact same image can be reconstructed in the future, ensuring consistency.

Mechanism Used by Docker to Pull and Verify Images

Docker employs a well-structured and secure mechanism to pull and verify container images, ensuring integrity, authenticity, and compatibility. Below is a detailed explanation of how this process works:

1. Image Reference Resolution

Docker begins by parsing the image reference provided in the docker pull command. This reference can include:

Registry: Defaults to Docker Hub if unspecified.
Repository: The namespace and name of the image (e.g., nginx).
Tag or Digest: Identifies the specific image version (e.g., :latest or @sha256:<digest>).

2. Authentication

If the registry requires authentication, Docker retrieves credentials stored locally (e.g., in ~/.docker/config.json). These credentials are used to acquire an authentication token, which is sent with subsequent requests to the registry.

3. Fetching the Manifest

Docker fetches the image manifest, which is a JSON document describing the image’s metadata, from the registry’s /v2/<repository>/manifests/<reference> endpoint. The manifest provides:

Schema Version: Defines the structure of the manifest.
Image Configuration Blob: Specifies OS, architecture, and other metadata.
Layer Digests: A list of cryptographic digests representing the image layers.
Manifest List (if multi-platform image): For multi-platform images, points to platform-specific manifests.

4. Layer Resolution and Download

Docker uses the digests provided in the manifest to locate the image layers. It requests these layers from the /v2/<repository>/blobs/<digest> endpoint. If a layer is already present locally, Docker skips its download, leveraging caching to optimise performance.

5. Integrity Verification

Docker verifies the integrity of each downloaded component:

Layer Verification: Each layer’s SHA256 hash is calculated and compared to the digest provided in the manifest.
Configuration Blob Verification: The configuration blob is also verified against its digest.
Image SHA Verification: For images pulled by digest (e.g., @sha256:<digest>), Docker ensures the manifest’s digest matches the specified value.

Any mismatch during these checks causes the pull operation to fail, ensuring the image’s integrity.

6. Reconstruction of the Image

After all layers and the configuration blob are verified, Docker reconstructs the image locally by:

Extracting and assembling the layers as described in the manifest.
Applying the configuration blob to create a container-ready image.

7. Optional Signature Verification

If Docker Content Trust (DCT) is enabled, Docker uses Notary to verify the image’s cryptographic signature. This ensures the image was signed by a trusted publisher and hasn’t been tampered with.

8. Final Storage

The verified image layers and metadata are stored in Docker’s local storage backend (e.g., /var/lib/docker on Linux). The image is now ready to be used.

Summary

Docker’s pull and verification process involves several layers of checks to ensure that images are:

Authentic: Only images from trusted registries are pulled.
Intact: No tampering occurred during transmission.
Compatible: The correct platform-specific image is chosen when applicable.

An example of a Hash tree for inductiveautomation/ignition:8.1.19 image can be described as follows:

Learnings from the Pull/Verification Mechanism

Image as a Sequence of Layers and Configuration

Through this mechanism, it becomes clear that a Docker image is a sequence of layers along with a configuration blob. Each layer represents a part of the filesystem, while the configuration defines runtime properties such as environment variables and the command to execute. Images are reconstructed locally by fetching or caching these blobs, enabling efficient storage and transfer.

Hash Tree Implementation for Integrity Verification

Docker’s verification process highlights how a hash tree (Merkle tree) is implemented to ensure image integrity. Each component (layer or config blob) has a cryptographic hash that is verified against the manifest. If planning to write a package registry, this approach is invaluable. A hash tree can ensure package integrity over the network by verifying the hash of each content piece, providing a seamless and secure solution.

Insights into Docker Hub Storage and Client Interaction

Docker Hub stores image manifests, configuration blobs, and compressed layer blobs. The Docker client interacts with these components through the Docker Registry HTTP API V2. The manifest acts as a blueprint for pulling the correct blobs and reconstructing the image. This interaction ensures images can be efficiently fetched, verified, and stored with minimal overhead.

References

Forum where the mechanism is descibed: https://forum.inductiveautomation.com/t/how-to-verify-integrity-of-docker-image/64536/2

Image Spec by Open Container Initiative: https://github.com/opencontainers/image-spec

Docs for Image Manifests: https://github.com/opencontainers/image-spec/blob/main/manifest.md

How Docker images are pulled and verified from DockerHub

Table of contents