Skip to main content
Skip to main content
Edit this page

External disks for storing data

Data processed in ClickHouse is usually stored in the local file system of the machine on which ClickHouse server is running. That requires large-capacity disks, which can be expensive. To avoid storing data locally, various storage options are supported:

  1. Amazon S3 object storage.
  2. Azure Blob Storage.
  3. Unsupported: The Hadoop Distributed File System (HDFS)

Note

ClickHouse also has support for external table engines, which are different from the external storage option described on this page, as they allow reading data stored in some general file format (like Parquet). On this page we are describing storage configuration for the ClickHouse MergeTree family or Log family tables.

  1. to work with data stored on Amazon S3 disks, use the S3 table engine.
  2. to work with data stored in Azure Blob Storage, use the AzureBlobStorage table engine.
  3. to work with data in the Hadoop Distributed File System (unsupported), use the HDFS table engine.

Configure external storage

MergeTree and Log family table engines can store data to S3, AzureBlobStorage, HDFS (unsupported) using a disk with types s3, azure_blob_storage, hdfs (unsupported) respectively.

Disk configuration requires:

  1. A type section, equal to one of s3, azure_blob_storage, hdfs (unsupported), local_blob_storage, web.
  2. Configuration of a specific external storage type.

Starting from 24.1 clickhouse version, it is possible to use a new configuration option. It requires specifying:

  1. A type equal to object_storage
  2. object_storage_type, equal to one of s3, azure_blob_storage (or just azure from 24.3), hdfs (unsupported), local_blob_storage (or just local from 24.3), web.

Optionally, metadata_type can be specified (it is equal to local by default), but it can also be set to plain, web and, starting from 24.4, plain_rewritable. Usage of plain metadata type is described in plain storage section, web metadata type can be used only with web object storage type, local metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).

For example:

is equal to the following configuration (from version 24.1):

The following configuration:

is equal to:

An example of full storage configuration will look like:

Starting with version 24.1, it can also look like:

To make a specific kind of storage a default option for all MergeTree tables, add the following section to the configuration file:

If you want to configure a specific storage policy for a specific table, you can define it in settings while creating the table:

You can also use disk instead of storage_policy. In this case it is not necessary to have the storage_policy section in the configuration file, and a disk section is enough.

Dynamic Configuration

There is also a possibility to specify storage configuration without a predefined disk in configuration in a configuration file, but can be configured in the CREATE/ATTACH query settings.

The following example query builds on the above dynamic disk configuration and shows how to use a local disk to cache data from a table stored at a URL.

The example below adds a cache to external storage.

In the settings highlighted below notice that the disk of type=web is nested within the disk of type=cache.

Note

The example uses type=web, but any disk type can be configured as dynamic, including local disk. Local disks require a path argument to be inside the server config parameter custom_local_disks_base_directory, which has no default, so set that also when using local disk.

A combination of config-based configuration and sql-defined configuration is also possible:

where web is from the server configuration file:

Using S3 Storage

Required parameters

ParameterDescription
endpointS3 endpoint URL in path or virtual hosted styles. Should include the bucket and root path for data storage.
access_key_idS3 access key ID used for authentication.
secret_access_keyS3 secret access key used for authentication.

Optional parameters

ParameterDescriptionDefault Value
regionS3 region name.-
support_batch_deleteControls whether to check for batch delete support. Set to false when using Google Cloud Storage (GCS) as GCS doesn't support batch deletes.true
use_environment_credentialsReads AWS credentials from environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN if they exist.false
use_insecure_imds_requestIf true, uses insecure IMDS request when obtaining credentials from Amazon EC2 metadata.false
expiration_window_secondsGrace period (in seconds) for checking if expiration-based credentials have expired.120
proxyProxy configuration for S3 endpoint. Each uri element inside proxy block should contain a proxy URL.-
connect_timeout_msSocket connect timeout in milliseconds.10000 (10 seconds)
request_timeout_msRequest timeout in milliseconds.5000 (5 seconds)
retry_attemptsNumber of retry attempts for failed requests.10
single_read_retriesNumber of retry attempts for connection drops during read.4
min_bytes_for_seekMinimum number of bytes to use seek operation instead of sequential read.1 MB
metadata_pathLocal filesystem path to store S3 metadata files./var/lib/clickhouse/disks/<disk_name>/
skip_access_checkIf true, skips disk access checks during startup.false
headerAdds specified HTTP header to requests. Can be specified multiple times.-
server_side_encryption_customer_key_base64Required headers for accessing S3 objects with SSE-C encryption.-
server_side_encryption_kms_key_idRequired headers for accessing S3 objects with SSE-KMS encryption. Empty string uses AWS managed S3 key.-
server_side_encryption_kms_encryption_contextEncryption context header for SSE-KMS (used with server_side_encryption_kms_key_id).-
server_side_encryption_kms_bucket_key_enabledEnables S3 bucket keys for SSE-KMS (used with server_side_encryption_kms_key_id).Matches bucket-level setting
s3_max_put_rpsMaximum PUT requests per second before throttling.0 (unlimited)
s3_max_put_burstMaximum concurrent PUT requests before hitting RPS limit.Same as s3_max_put_rps
s3_max_get_rpsMaximum GET requests per second before throttling.0 (unlimited)
s3_max_get_burstMaximum concurrent GET requests before hitting RPS limit.Same as s3_max_get_rps
read_resourceResource name for scheduling read requests.Empty string (disabled)
write_resourceResource name for scheduling write requests.Empty string (disabled)
key_templateDefines object key generation format using re2 syntax. Requires storage_metadata_write_full_object_key flag. Incompatible with root path in endpoint. Requires key_compatibility_prefix.-
key_compatibility_prefixRequired with key_template. Specifies the previous root path from endpoint for reading older metadata versions.-
Note

Google Cloud Storage (GCS) is also supported using the type s3. See GCS backed MergeTree.

Using Plain Storage

In 22.10 a new disk type s3_plain was introduced, which provides a write-once storage. Configuration parameters for it are the same as for the s3 disk type. Unlike the s3 disk type, it stores data as is. In other words, instead of having randomly generated blob names, it uses normal file names (the same way as ClickHouse stores files on local disk) and does not store any metadata locally. For example, it is derived from data on s3.

This disk type allows keeping a static version of the table, as it does not allow executing merges on the existing data and does not allow inserting of new data. A use case for this disk type is to create backups on it, which can be done via BACKUP TABLE data TO Disk('plain_disk_name', 'backup_name'). Afterward, you can do RESTORE TABLE data AS data_restored FROM Disk('plain_disk_name', 'backup_name') or use ATTACH TABLE data (...) ENGINE = MergeTree() SETTINGS disk = 'plain_disk_name'.

Configuration:

Starting from 24.1 it is possible configure any object storage disk (s3, azure, hdfs (unsupported), local) using the plain metadata type.

Configuration:

Using S3 Plain Rewritable Storage

A new disk type s3_plain_rewritable was introduced in 24.4. Similar to the s3_plain disk type, it does not require additional storage for metadata files. Instead, metadata is stored in S3. Unlike the s3_plain disk type, s3_plain_rewritable allows executing merges and supports INSERT operations. Mutations and replication of tables are not supported.

A use case for this disk type is for non-replicated MergeTree tables. Although the s3 disk type is suitable for non-replicated MergeTree tables, you may opt for the s3_plain_rewritable disk type if you do not require local metadata for the table and are willing to accept a limited set of operations. This could be useful, for example, for system tables.

Configuration:

is equal to

Starting from 24.5 it is possible to configure any object storage disk (s3, azure, local) using the plain_rewritable metadata type.

Using Azure Blob Storage

MergeTree family table engines can store data to Azure Blob Storage using a disk with type azure_blob_storage.

Configuration markup:

Connection parameters

ParameterDescriptionDefault Value
storage_account_url (Required)Azure Blob Storage account URL. Examples: http://account.blob.core.windows.net or http://azurite1:10000/devstoreaccount1.-
container_nameTarget container name.default-container
container_already_existsControls container creation behavior:
- false: Creates a new container
- true: Connects directly to existing container
- Unset: Checks if container exists, creates if needed
-

Authentication parameters (the disk will try all available methods and Managed Identity Credential):

ParameterDescription
connection_stringFor authentication using a connection string.
account_nameFor authentication using Shared Key (used with account_key).
account_keyFor authentication using Shared Key (used with account_name).

Limit parameters

ParameterDescription
s3_max_single_part_upload_sizeMaximum size of a single block upload to Blob Storage.
min_bytes_for_seekMinimum size of a seekable region.
max_single_read_retriesMaximum number of attempts to read a chunk of data from Blob Storage.
max_single_download_retriesMaximum number of attempts to download a readable buffer from Blob Storage.
thread_pool_sizeMaximum number of threads for IDiskRemote instantiation.
s3_max_inflight_parts_for_one_fileMaximum number of concurrent put requests for a single object.

Other parameters

ParameterDescriptionDefault Value
metadata_pathLocal filesystem path to store metadata files for Blob Storage./var/lib/clickhouse/disks/<disk_name>/
skip_access_checkIf true, skips disk access checks during startup.false
read_resourceResource name for scheduling read requests.Empty string (disabled)
write_resourceResource name for scheduling write requests.Empty string (disabled)
metadata_keep_free_space_bytesAmount of free metadata disk space to reserve.-

Examples of working configurations can be found in integration tests directory (see e.g. test_merge_tree_azure_blob_storage or test_azure_blob_storage_zero_copy_replication).

Zero-copy replication is not ready for production

Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.

Using HDFS storage (Unsupported)

In this sample configuration:

  • the disk is of type hdfs (unsupported)
  • the data is hosted at hdfs://hdfs1:9000/clickhouse/

By the way, HDFS is unsupported and therefore there might be issues when using it. Feel free to make a pull request with the fix if any issue arises.

Keep in mind that HDFS may not work in corner cases.

Using Data Encryption

You can encrypt the data stored on S3, or HDFS (unsupported) external disks, or on a local disk. To turn on the encryption mode, in the configuration file you must define a disk with the type encrypted and choose a disk on which the data will be saved. An encrypted disk ciphers all written files on the fly, and when you read files from an encrypted disk it deciphers them automatically. So you can work with an encrypted disk like with a normal one.

Example of disk configuration:

For example, when ClickHouse writes data from some table to a file store/all_1_1_0/data.bin to disk1, then in fact this file will be written to the physical disk along the path /path1/store/all_1_1_0/data.bin.

When writing the same file to disk2, it will actually be written to the physical disk at the path /path1/path2/store/all_1_1_0/data.bin in encrypted mode.

Required Parameters

ParameterTypeDescription
typeStringMust be set to encrypted to create an encrypted disk.
diskStringType of disk to use for underlying storage.
keyUint64Key for encryption and decryption. Can be specified in hexadecimal using key_hex. Multiple keys can be specified using the id attribute.

Optional Parameters

ParameterTypeDefaultDescription
pathStringRoot directoryLocation on the disk where data will be saved.
current_key_idString-The key ID used for encryption. All specified keys can be used for decryption.
algorithmEnumAES_128_CTREncryption algorithm. Options:
- AES_128_CTR (16-byte key)
- AES_192_CTR (24-byte key)
- AES_256_CTR (32-byte key)

Example of disk configuration:

Using local cache

It is possible to configure local cache over disks in storage configuration starting from version 22.3. For versions 22.3 - 22.7 cache is supported only for s3 disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc. For versions >= 23.5 cache is supported only for remote disk types: S3, Azure, HDFS (unsupported). Cache uses LRU cache policy.

Example of configuration for versions later or equal to 22.8:

Example of configuration for versions earlier than 22.8:

File Cache disk configuration settings:

These settings should be defined in the disk configuration section.

ParameterTypeDefaultDescription
pathString-Required. Path to the directory where cache will be stored.
max_sizeSize-Required. Maximum cache size in bytes or readable format (e.g., 10Gi). Files are evicted using LRU policy when the limit is reached. Supports ki, Mi, Gi formats (since v22.10).
cache_on_write_operationsBooleanfalseEnables write-through cache for INSERT queries and background merges. Can be overridden per query with enable_filesystem_cache_on_write_operations.
enable_filesystem_query_cache_limitBooleanfalseEnables per-query cache size limits based on max_query_cache_size.
enable_cache_hits_thresholdBooleanfalseWhen enabled, data is cached only after being read multiple times.
cache_hits_thresholdInteger0Number of reads required before data is cached (requires enable_cache_hits_threshold).
enable_bypass_cache_with_thresholdBooleanfalseSkips cache for large read ranges.
bypass_cache_thresholdSize256MiRead range size that triggers cache bypass (requires enable_bypass_cache_with_threshold).
max_file_segment_sizeSize8MiMaximum size of a single cache file in bytes or readable format.
max_elementsInteger10000000Maximum number of cache files.
load_metadata_threadsInteger16Number of threads for loading cache metadata at startup.

Note: Size values support units like ki, Mi, Gi, etc. (e.g., 10Gi).

File Cache Query/Profile Settings

SettingTypeDefaultDescription
enable_filesystem_cacheBooleantrueEnables/disables cache usage per query, even when using a cache disk type.
read_from_filesystem_cache_if_exists_otherwise_bypass_cacheBooleanfalseWhen enabled, uses cache only if data exists; new data won't be cached.
enable_filesystem_cache_on_write_operationsBooleanfalse (Cloud: true)Enables write-through cache. Requires cache_on_write_operations in cache config.
enable_filesystem_cache_logBooleanfalseEnables detailed cache usage logging to system.filesystem_cache_log.
max_query_cache_sizeSizefalseMaximum cache size per query. Requires enable_filesystem_query_cache_limit in cache config.
skip_download_if_exceeds_query_cacheBooleantrueControls behavior when max_query_cache_size is reached:
- true: Stops downloading new data
- false: Evicts old data to make space for new data
Caution

Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported.

Cache system tables

Table NameDescriptionRequirements
system.filesystem_cacheDisplays the current state of the filesystem cache.None
system.filesystem_cache_logProvides detailed cache usage statistics per query.Requires enable_filesystem_cache_log = true

Cache commands

SYSTEM DROP FILESYSTEM CACHE (<cache_name>) (ON CLUSTER) -- ON CLUSTER

This command is only supported when no <cache_name> is provided

SHOW FILESYSTEM CACHES

Show a list of filesystem caches which were configured on the server. (For versions less than or equal to 22.8 the command is named SHOW CACHES)

DESCRIBE FILESYSTEM CACHE '<cache_name>'

Show cache configuration and some general statistics for a specific cache. Cache name can be taken from SHOW FILESYSTEM CACHES command. (For versions less than or equal to 22.8 the command is named DESCRIBE CACHE)

Cache current metricsCache asynchronous metricsCache profile events
FilesystemCacheSizeFilesystemCacheBytesCachedReadBufferReadFromSourceBytes, CachedReadBufferReadFromCacheBytes
FilesystemCacheElementsFilesystemCacheFilesCachedReadBufferReadFromSourceMicroseconds, CachedReadBufferReadFromCacheMicroseconds
CachedReadBufferCacheWriteBytes, CachedReadBufferCacheWriteMicroseconds
CachedWriteBufferCacheWriteBytes, CachedWriteBufferCacheWriteMicroseconds

Using static Web storage (read-only)

This is a read-only disk. Its data is only read and never modified. A new table is loaded to this disk via ATTACH TABLE query (see example below). Local disk is not actually used, each SELECT query will result in a http request to fetch required data. All modification of the table data will result in an exception, i.e. the following types of queries are not allowed: CREATE TABLE, ALTER TABLE, RENAME TABLE, DETACH TABLE and TRUNCATE TABLE. Web storage can be used for read-only purposes. An example use is for hosting sample data, or for migrating data. There is a tool clickhouse-static-files-uploader, which prepares a data directory for a given table (SELECT data_paths FROM system.tables WHERE name = 'table_name'). For each table you need, you get a directory of files. These files can be uploaded to, for example, a web server with static files. After this preparation, you can load this table into any ClickHouse server via DiskWeb.

In this sample configuration:

  • the disk is of type web
  • the data is hosted at http://nginx:80/test1/
  • a cache on local storage is used
Tip

Storage can also be configured temporarily within a query, if a web dataset is not expected to be used routinely, see dynamic configuration and skip editing the configuration file.

A demo dataset is hosted in GitHub. To prepare your own tables for web storage see the tool clickhouse-static-files-uploader

In this ATTACH TABLE query the UUID provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.

A ready test case. You need to add this configuration to config:

And then execute this query:

Required parameters

ParameterDescription
typeweb. Otherwise the disk is not created.
endpointThe endpoint URL in path format. Endpoint URL must contain a root path to store data, where they were uploaded.

Optional parameters

ParameterDescriptionDefault Value
min_bytes_for_seekThe minimal number of bytes to use seek operation instead of sequential read1 MB
remote_fs_read_backoff_threasholdThe maximum wait time when trying to read data for remote disk10000 seconds
remote_fs_read_backoff_max_triesThe maximum number of attempts to read with backoff5

If a query fails with an exception DB:Exception Unreachable URL, then you can try to adjust the settings: http_connection_timeout, http_receive_timeout, keep_alive_timeout.

To get files for upload run: clickhouse static-files-disk-uploader --metadata-path <path> --output-dir <dir> (--metadata-path can be found in query SELECT data_paths FROM system.tables WHERE name = 'table_name').

When loading files by endpoint, they must be loaded into <endpoint>/store/ path, but config must contain only endpoint.

If URL is not reachable on disk load when the server is starting up tables, then all errors are caught. If in this case there were errors, tables can be reloaded (become visible) via DETACH TABLE table_name -> ATTACH TABLE table_name. If metadata was successfully loaded at server startup, then tables are available straight away.

Use http_max_single_read_retries setting to limit the maximum number of retries during a single HTTP read.

Zero-copy Replication (not ready for production)

Zero-copy replication is possible, but not recommended, with S3 and HDFS (unsupported) disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.

Zero-copy replication is not ready for production

Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.