Hashing

I find it hard to believe that any developer can deny the importance and usefulness that a right use of hashes can offer their applications.

From security and data management to object comparisons, hashes are used extensively throughout the framework and should, in the right circumstances, be used by you as a developer when the time merits them. But, what exactly is a hash? Simply speaking it’s a form of encryption that maps data of arbitrary length to a data object of a fixed size (number, string, etc),  but if you want to get all technical you can read this article on Wikipedia, wherein it explains hashes and it’s use through hash functions:

A hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes. One use is a data structure called a hash table, widely used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file. An example is finding similar stretches in DNA sequences. They are also useful in cryptography. A cryptographic hash function allows one to easily verify that some input data maps to a given hash value, but if the input data is unknown, it is deliberately difficult to reconstruct it (or equivalent alternatives) by knowing the stored hash value. This is used for assuring integrity of transmitted data, and is the building block for HMACs, which provide message authentication.

When it comes to development, I find hashes are useful in a few scenarios:

  • Protection of sensitive user data when stored in databases, such as: passwords, verification codes, banking information, etc. (this should really be considered a must…)
  • Transmission of sensitive user information through web-services, like authentication credentials (again, a must…)
  • Storage of resources based on unique keys, i.e: and image downloaded from a url should be stored on the device as a hash of that url. (optional, but recommended if you want brownie points)

So how does hashing work on iOS?

Let’s say we start with an arbitrary data item, something that we can easily convert into an NSData object, in this case, a String:


let string = "helloworld"

We could easily get an integer based hash value out of the box by calling:


let string = "helloworld"
print(string.hash) //prints out -5723890270157574074 in iOS8

Or:


let string = "helloworld"
print(string.hashValue) //prints out 4799450060296076391 in iOS8

Just bear in mind the following, as stated in the documentation:

The hash value is not guaranteed to be stable across different invocations of the same program. Do not persist the hash value across program runs.

Indeed, if we run the simulator in iOS 7 we will notice that the hashes returned are different:


let string = "helloworld"
print(string.hash) //prints out 12736182381 in iOS7
print(string.hashValue) //prints out 39782312032 in iOS7

Unfortunately, the three scenarios I mentioned at the top of this article need persistable and encrypted hashes. So even though these particular hash values are useless for these purposes, they still have their uses, specially when you’re implementing at runtime comparisons on two objects, i.e:


func ==(left:Message, right:Message) -> Bool {
 return left.text.hashValue == right.text.hashValue
}
struct Message : Equatable {
 var text : String
}

let message1 = Message(text: "helloworld")
let message2 = Message(text: "helloworld")
print(message1 == message2) //will return true

So then, what we really need is…

Persistant encrypted hashing

Specifically, we want to use the CommonCrypto library. It’s the expert when it comes to this, and it comes packed with utilities for hashing using the following well defined and universally accepted functions:

  • MD2
  • MD4
  • MD5
  • SHA1
  • SHA224
  • SHA256
  • SHA384
  • SHA512

But… it’s a bit daunting the first time you use it. Mainly because the library is written and thought out in low level C. So we’re going to write a small wrapper for simplifying communications with this framework. An we’re going to start by writing an enumeration containing all valid hash types:


enum HashAlgorithm {
  case MD2
  case MD4
  case MD5
  case SHA1
  case SHA224
  case SHA256
  case SHA384
  case SHA512
}

With this enum in place we can write two utility functions inside it: one for forking logic when we want to hash data; and one for fetching the digest length depending on the hash function being used. We are also going to mark these functions as private to protect their scope from being used externally:


enum HashAlgorithm {
  case MD2
  case MD4
  case MD5
  case SHA1
  case SHA224
  case SHA256
  case SHA384
  case SHA512

  //MARK: - private utilities
  private func digestLength() -> Int {
    switch(self) {
    case .MD2:
      return Int(CC_MD2_DIGEST_LENGTH)
    case .MD4:
      return Int(CC_MD4_DIGEST_LENGTH)
    case .MD5:
      return Int(CC_MD5_DIGEST_LENGTH)
    case .SHA1:
      return Int(CC_SHA1_DIGEST_LENGTH)
    case .SHA224:
      return Int(CC_SHA224_DIGEST_LENGTH)
    case .SHA256:
      return Int(CC_SHA256_DIGEST_LENGTH)
    case .SHA384:
      return Int(CC_SHA384_DIGEST_LENGTH)
    case .SHA512:
      return Int(CC_SHA512_DIGEST_LENGTH)
    }
  }

  private func hashData(data:NSData, inout digest:[UInt8]) {
    switch(self) {
    case .MD2:
      CC_MD2(data.bytes, CC_LONG(data.length), &digest)
    case .MD4:
      CC_MD4(data.bytes, CC_LONG(data.length), &digest)
    case .MD5:
      CC_MD5(data.bytes, CC_LONG(data.length), &digest)
    case .SHA1:
      CC_SHA1(data.bytes, CC_LONG(data.length), &digest)
    case .SHA224:
      CC_SHA224(data.bytes, CC_LONG(data.length), &digest)
    case .SHA256:
      CC_SHA256(data.bytes, CC_LONG(data.length), &digest)
    case .SHA384:
      CC_SHA384(data.bytes, CC_LONG(data.length), &digest)
    case .SHA512:
      CC_SHA512(data.bytes, CC_LONG(data.length), &digest)
    }
  }
}

You’ll notice your code doesn’t run, that’s because we need to import the CommonCrypto library into the project. Unfortunately, this library is (not yet?) supported in Swift, so to import it you will need to create a bridging header and import the library using the good-old Obj-C way:


//
// Use this file to import your target's public headers that you would like to expose to Swift.
//

#import <CommonCrypto/CommonCrypto.h>

Once this is in place, you can now add the following NSData extension in the same file we declared our hash algorithm enumeration. This will allow for the extension method in this class to access the private functions on the enumeration without the need for these functions to ever be exposed to the outside world, allowing us to have 1, and only 1, point of entry for managing our hash functions (how convenient is that???)


extension NSData {

  //MARK: - hashing
  func hashUsingAlgorithm(algorithm: HashAlgorithm) -> String {
    let hashLength = algorithm.digestLength()
    var digest = [UInt8](count:hashLength, repeatedValue: 0)
    algorithm.hashData(self, digest: &digest)

    let output = NSMutableString(capacity: hashLength)
    for byte in digest {
        output.appendFormat("%02x", byte)
    }

    return output as String
  }

}

Notice how we are leveraging the use of our enum value to abstract and make sure we fetch the right digest length and use the right encryption function depending on our requirements. I’m already feeling our development efforts shrinking. Now all we need to do is to create a convenience function for string objects, this is fairly straightforward:


extension String {

  //MARK: - hashing
  func hashUsingAlgorithm(algorithm: HashAlgorithm) -> String? {
    if let data = self.dataUsingEncoding(NSUTF8StringEncoding) {
      return data.hashUsingAlgorithm(algorithm)
    }
    return nil;
  }

}

Now our hashing functions will return persistable and consistent values for our data, that we can use across multiple program runs, as follows:


let string = "helloworld"
print(string.hashUsingAlgorithm(HashAlgorithm.MD2)!) //0ae06562456c6e0e9736f4feda3a477b
print(string.hashUsingAlgorithm(HashAlgorithm.MD4)!) //793033db97268fc9ceebde269797e54b
print(string.hashUsingAlgorithm(HashAlgorithm.MD5)!) //fc5e038d38a57032085441e7fe7010b0
print(string.hashUsingAlgorithm(HashAlgorithm.SHA1)!) //6adfb183a4a2c94a2f92dab5ade762a47889a5a1
print(string.hashUsingAlgorithm(HashAlgorithm.SHA224)!) //b033d770602994efa135c5248af300d81567ad5b59cec4bccbf15bcc
print(string.hashUsingAlgorithm(HashAlgorithm.SHA256)!) //936a185caaa266bb9cbe981e9e05cb78cd732b0b3280eb944412bb6f8f8f07af
print(string.hashUsingAlgorithm(HashAlgorithm.SHA384)!) //97982a5b1414b9078103a1c008c4e3526c27b41cdbcf80790560a40f2a9bf2ed4427ab1428789915ed4b3dc07c454bd9
print(string.hashUsingAlgorithm(HashAlgorithm.SHA512)!) //1594244d52f2d8c12b142bb61f47bc2eaf503d6d9ca8480cae9fcf112f66e4967dc5e8fa98285e36db8af1b8ffa8b84cb15e0fbcf836c3deb803c13f37659a60

Notice that the higher the hashing algorithm used the more complex the function will become and the longer our hash will be, at the cost of performance. Choose your hashing functions carefully!

As always, I’ve published a copy of the utility class developed here in my Github page. Free for all of you to download and use.

Happy coding! 😉

Author: Danny Bravo

Director @ EPIC