SENSITIVE DATA! It’s an interesting topic! In this post I’m trying to explain how to hash data to increase security during ETL. Assume that we have sensitive data stored in several secured source systems. The source systems are located in different countries and different regions. As the source systems themselves are secured, how we can cover data security needs during ETL process to read data from source systems and load into staging area? Apart from using secured network infrastructure, VPN, network tunnelling etc. we need to cover data layer security to extract sensitive data. One of the best ways is hashing data when it is extracting from source databases. Hashbytes is a T-SQL function that is available in SQL Server 2005 and later. As you might know there are many hashing algorithms, but, different SQL Server versions are supporting different range of hashing algorithms. For instance SHA1 is supported by SQL Server 2005 and later, but, if you are looking more secure hashing systems like SHA2, 256 (32 bytes) or 512 (64 bytes), you should use SQL Server 2012. Actually the hashbytes function will return null in earlier versions of SQL Server. If you are looking for a higher level of security like SHA3 that is originally known as “Keccak” you should wait for it for a long time as based on my investigations it is not supported even in SQL Server 2014 OR you can write your own SHA3 code OR just rely on some third party codes available on the Internet! So let’s get our hands dirty with using hashbytes in different versions of SQL Server.