A Collision Attack On Sdhash Similarity Hashing

Share

Summary:

Digital forensic investigators can take advantage of tools and techniques that have the capability of finding similar files out of thousands of files up for investigation in a particular case. Finding similar files could significantly reduce the volume of data that needs to be investigated. Sdhash is a well-known fuzzy hashing scheme used for finding similarity among files. This digest produces a ‘score of similarity’ on a scale of 0 to 100. In a prior analysis of sdhash, Breitinger et al. claimed that 20% contents of a file can be modified without influencing the final sdhash digest of that file. They suggested that the file can be modified in certain regions, termed ‘gaps’, and yet the sdhash digest will remain unchanged. In this work, we show that their claim is not entirely correct. In particular, we show that even if 2% of the file contents in the gaps are changed randomly, then the sdhash gets changed with probability close to 1. We then provide an algorithm to modify the file contents within the gaps such that the sdhash remains unchanged even when the modifications are about 12% of the gap size. On the attack side, the proposed algorithm can deterministically produce collisions by generating many di↵erent files corresponding to a given file with maximal similarity score of 100.

Publication Type: Conference

Publication Date: September 30th, 2015

Publisher: 10th International Conference on Systematic Approaches to Digital Forensic Engineering

Author(s): Donghoon Chang, Somitra Sanadhya, Monka Singh, Dr. Robin Verma

Recent Releases