Deduplication has been around for some time now, but may still not be fully understood for its variety of properties.
The technology, sometimes referred to as data deduping or data dedupe, is essentially a method of space-saving and works by dividing data into segments or chunks. These segments are then evaluated for their similarities and if two segments are deemed to be the same, one will be deleted and the other will be stored. New segments of data are also stored and will be compared to other segments in the future.
This method of backing up a server massively saves space, especially to avoid multiple copies of the same file occuring after every full backup or new copies of the same information being made during an incremental backup when there is a limited amount of new information that needs to be stored.
Deduplication is different from compression in that compresson you have to do manually to specific files, whereas deduplication happens to all files that are being backed up.
It has been hailed as a way to eliminate redundant data, assist in optimising IT departments’ backup environments, reduce costs and, perhaps the most important aspect, recover data faster. Deduplication can have enormous impacts on a company’s disaster recovery and data storage space costs, since deduplication data ratios can range from anything between 3:1 to 200:1 and more.
Companies that tend to do more frequent full backups normally have higher ratios. Generally, space savings in primary storage range between 50-60% or more for typical data, and as much as 90% or more for things like virtual desktop images.
However, with its pros, deduplication has some cons that does not make it the all round solution to disaster recovery issues, such as fast recovery. The dedupe process can create complexity and overhead, which defeats the purpose of using the dedupe process for faster recovery.
There is also an increase in the risk of data corruption, if anything were to go wrong in the process. For example, if one piece of information goes bad, all the segments of information that are linked/referred to it will go bad too. This is why it is important to invest in an additional backup when using the deduplication process, and for some companies it may not be worth the trouble.
Another drawback for some companies would be the need for the metadata of each segment in a deduplication system to be maintained and for a filing system to be managed to ensure the data is stored and identified correctly. This may persuade companies not to implement a deduplication process as they may not be able to afford the man hours or resources needed to maintain such a system.
ActiveImage has a tested and proven deduplication method that provides all of the positive with state-of-the-art deduplication with mitigated risks of data corruption or recovery speed concerns. In fact you can check out ActiveImage’s backup speed industry comparisons with solutions such as ShadowProtect (from StorageCraft), Arcserve, Veritas and Acronis.
Depending on your business, data deduping may be the best choice for you. Results vary and you can not say it will work for you since it works for your neighbour or competitor. Different methods and processes work for different forms of data and behaviour patterns. If a company has mainly dissimilar data, deduplication is not worth investing in, but it would work wonders for a company that deals with comparable data.
Results are highly variable depending on the type of data and the number of duplicate segments in the data. Therefore, it is best to perform a full concept test before you commit to deduplication or any other backup strategy.