After my previous articles on backup security and object storage on-prem, today, we’ll examine the best practices for using them with Veeam.
Here Why go with on-premises object storage and here The choice of storage explained my ideas about a new generation storage as a backup repository.
So we’ll check out the best configuration for creating a backup job and explain why we chose it.
Object Storage as the main repository in Veeam v12.3.
As we all know, since version 12, it’s been possible to use an object storage repository as a primary destination for backups. This makes the architecture simpler and offers greater flexibility.
When you’re getting started with object storage as a repository, it’s important to remember the backup block size (block size) that’s used for writing.
The size you choose in the ‘Storage Optimization’ tab, configuring a backup job, affects how the data is divided and stored.
How do we calculate the space we need for our repository?
I started using a calculation tool released by Object First. Its slogan says: “Get secure, simple, and powerful backup storage with out-of-the-box immutability optimized specifically for Veeam”. And its calculator permits, starting from source size and with policy retentions, to have the right size of our appliance.
For our example I’ve set the following parameters:
- Data Source: 15 TB
- Compression: optimal (estimated 40%)
- Retention policy: 30 days without GFS.
- Job Settings: Monthly health check and Data Encryption enabled
- No immutability.
Now, let’s take a look at what changes depending on the block size you choose:
1MB Block size | 4MB Block size |
![]() |
![]() |
Link to the calculator here
As you can see, if we have standard VMs and use two different sizes, we have a higher occupation of repository space.
But this isn’t always true. Thanks to a script created by Matthias Mehrtens called GET-RPstatistics, we can get a summary of all the restore points written to our repository.
Here’s the link to the script on GitHub: Get-RPSstatistics
VMName | BackupJob | Duration | BackupType | ProcessedData | DataSize | DataRead | BackupSize |
drs-k8s | BJ_ootbi_imm_HIGH_1MB | 00:07:25 | Full | 42969595904 | 42949697298 | 42969595904 | 5586733304 |
drs-k8s | BJ_ootbi_imm_HIGH_4MB | 00:05:08 | Full | 42969595904 | 42949697298 | 42969595904 | 5523810000 |
It’s pretty clear that for a 40 GB VM and a backup size of just over 5 GB, the number of read and write blocks, as well as the size, will change.
DedupRatio | ComprRatio | Reduction | Blocksize | NumOfBlocksRead | NumOfBlocksWritten | AvgBlocksizeWritten |
2,941176471 | 2,631578947 | 7,73993808 | 1048576 | 40979 | 13933 | 400971,313 |
2,941176471 | 2,702702703 | 7,949125596 | 4194304 | 10244,75 | 3483 | 1585934,539 |
But this is just an example from a lab, so it’s not that useful in practice. Generally, using normal block size increases the compression and deduplication factor saving more space for backups.
Using larger blocks is better for backing up and recovering files, especially the bigger ones.
Instead, using smaller block sizes will result in more fragmented storage. This will affect performance over time.
Unfortunately, we haven’t the right answer to know what is the right block size to use.
In my opinion, the best approach is to look at the content of the workloads to see which are the best choices for the job.
It’s important to test to find out the best backup size for your specific needs.
Start with the standard backup size (1 MB) and regularly check performance and space usage. If you encounter performance issues or excessive space consumption, simply change the size.
Writing Data to the Repository
Now, we see the process Veeam uses to write backup data to object storage.
Checkpoints creation: First of all, Veeam makes these checkpoints, which are like markers showing the ‘state’ of the backup chain at a given time. They also contain info about the backup structure and the files. These checkpoints are stored in the repository with the backup data. Checkpoints are very important because they let Veeam build the backup chain even if the actual data is spread over a bunch of different objects. Here’s an example of a tree that can be ‘read’ using any S3 browser:
Splitting into Objects: Backup data are split into smaller objects based on the chosen block size, which lets Veeam load the data in parallel and improve performance.
Here’s the previous million-dollar question: what’s the right block size to use with on-premise object storage?
It’s usually best to stick with the default settings and the 1 MB option (don’t use smaller blocks with on-site storage).
If your VMs contain a lot of big files, such as databases or media files, you might want to try a bigger block size, maybe 2 or 4 MB, or even 8 MB, as this can improve performance, but it’s not the only thing to think about. It also depends on the hardware you’re using, so you should check out two key things: latency and throughput.
Concurrent Tasks: Veeam can use several connections at the same time to load objects into the object storage repository thanks to the parallelism system.
Latency is the principal issue affecting backup performance, for this it’s important to have a storage system with low latency that permits the performances to be OK. Throughput is also important, so it needs to be enough to handle the volume of data in the backups. You can adjust the number of connections in the repository settings. As you can see, the “Concurrent Tasks” are handled by the Proxy role (to remember that is integrated into the File System agent), so it’s also important to have the right sizing for the component that carries the data.
Proxies usually need more CPU, while Repositories use more RAM. Gateway servers basically do the same job as repositories, so you can think of them as being the same.
Compression and deduplication: Before data is uploaded, it’s compressed and deduplicated to save storage space.
Immutability: with Veeam, you can use immutability, if you’ve configured the bucket correctly on our storage, to protect backups from being tampered with or deleted.
Immutability
We need to understand how data immutability works better because the logic used in object storage is different from that of file storage or block storage. When we apply immutability, we need to calculate the retention rule using the rule on the help center page.
Actual retention = job retention policy + immutability period + Block Generation period
Block Generation period is fixed, so bear that in mind. You can find the instructions on the relevant page. In local storage normally this period is fixed in 10 days.
NOTE: We can also think about changing this value by making registry changes, but we really don’t recommend it.
Veeam, by default, uses a Forever forward incremental chain to write to object storage, and it’s only possible to use Active Full backups (there’s no function for synthetic full), but this leads to a significant increase in space occupancy.
I did some tests with a 20TB source, 30-day policy, and GFS + immutability without Active Full setup, and I got a usage of 33TB.
In the second scenario, setting an Active full every 7 days results in a usage of 218 TB.
Also, when you go into the advanced options, the same configurator tells you that Active Full takes up a lot of space.
So, best to use an FFI chain without active fulls in these cases. But if you want a result like full? It’s very simple. You’ll need to set up a long-term retention with GFS.
This way, you’ll just be creating checkpoints, which will rebuild the blocks needed to get the full backup without setting up a whole new backup chain from scratch.
Let’s go back to our rule.
Actual retention = job retention policy + immutability period + Block Generation period
What does this mean in practice? The actual retention on the storage will be the total of the 3 items. Assuming:
- Job retention: 20 days
- Immutability: 20 days
- Block gen: 10 days
So, we’ll end up with a sum of 50 days retained in our repository. We wonder why we have to keep 50 days, even though we’ve only set 20 days of retention.
Retention policy
With a Forward Forever Incremental chain, the first time creates a full backup, the following ones will always be incremental.
In Veeam job details we can see this:
When the day of retention is reached there will be the injection of the nearest incremental and vn Veeam job details we can see this:
At the end in Veeam job details we can see this:
So, the first full house will not be on Sunday, as it used to be. It will be on Monday, as you can see in the picture below.
Then the merged incremental won’t be needed anymore and will be deleted.
From various experiences checked on Veeam R&D Forums and Community Hub, it is highly recommended to set the retention policy at least equal to Immutability + Block Generation.
For example:
- Immutability: 10 days
- Block Generation: 10 days
- Retention: 20 days (as recommended).
Day 1: a full backup is created. Veeam generates a checkpoint that includes all the blocks in the backup. If the immutability period is 10 days and Block Generation adds another 10 days, this restore point (and its blocks) will be protected for 20 days.
Day 2: there’s an incremental backup. Veeam creates a new restore point and updates the checkpoint. If some blocks from day 1 need to be ‘reused’ or referenced, immutability is extended to maintain the integrity of the entire chain.
Day 3 and after: the checkpoint just gets updated with each new backup. When the protection chain is finished, Veeam makes a new checkpoint that represents the ‘consolidated’ state of the chain and releases the old versions. These are deleted once their immutability period (including Block Generation) has expired.
NOTE: When the 20-day retention period for the Veeam job ends, it removes both the initial Full and any subsequent backups from the backup database. However, on object storage, data blocks will remain protected until the immutability period (including Block Generation) also ends. Only then will objects that are no longer referenced be physically deleted.
Conclusion: how should we configure our backup jobs?
To answer this question we must use the experience we’ve gained doing our lab tests, working with customers, and monitoring storage occupancy trends.
This experience has led us to recommend a list of settings that will optimize the use of on-prem object storage.
- Set the backup job retention to the sum of the immutability period and the block generation period.
- Do not assume that immutability must be set for very long periods.
- Use a backup of type Forever Forward Incremental
- Do not use periodic Active Full.
- Use GFS chains to have full backups without starting fulls
- Activate the health check at least once a month
- Set a consistent number of concurrent tasks after correctly sizing the gateway servers.
I don’t have any other thoughts to add right now, but if you’ve got any other suggestions, I’m all ears. Thanks for reading!