Benchmarking memcmp() for timing attacks

We were interested in timing attacks against memcmp() for various reasons (CVE-2017-2624) and decided to explore the topic a bit further.

Timing attacks against memcmp() have been successful in the past against light bulbs or even the Xbox 360.

The basic idea behind these kinds of attacks is, that memcmp() takes longer if the bytes of the memory areas match, since an additional byte will be compared. If this difference could be measured, an attacker could brute force secrets much more efficient by testing all values for the first byte, taking the one which took the longest to compare and then go on with testing all values for the second byte. The difference in timing is pretty small since only a few instructions differ in contrast to other widely abused timing attacks, like the timing attack against OpenSSH which allowed to enumerate the users. In the OpenSSH case, the password send to the daemon was only verified against the locally stored hash, if the user exists, if the user did not exists, there was no hash to compare against and the login attempt was rejected instantly. In order to exploit this issue efficiently a very long password used, which takes a long time to hash. This increased the time between rejection of a existing user and a non existing one enough to be easily measurable over the network.

In order to protect against this, especially in cryptographically relevant code, there are several implementations of a timing constant memory comparisons.

How big is the timing difference when comparing “AA” to “BA”? This highly depends on the implementation of memcmp(). Some implementations use a block size compare to speed things up, others use modern instructions like SSE for faster comparisons. Take a look at the glibc implementation which is quite lengthy to achieve the best speed for all use cases. We decided to do a quick benchmark.

This of course highly depends on the system and the load on the machine. In order to reduce side influences we used nice to give the process the highest priority and pinned it to a single CPU using taskset. Furthermore you want to disable powersaving during the test to make sure that the CPU speed is not changing.

Lets take a look of two runs, one comparing two and the other comparing 64 bytes. We run each comparison a 10000000 times and add the timings. To reduce the impact of other tasks even further, we interleave the tests:

for (i = 0; i < count; i++) {
	a = get_ticks();
	comp(cookiegood, cookie, comparesize);
	b = get_ticks();
	totalg += (b - a);

	a = get_ticks();
	comp(cookiebad, cookie, comparesize);
	b = get_ticks();
	totalb += (b - a);
}

How to measure the timing? We have opted to use rdtsc, which returns the processors time stamp counter, which is usually considered a high-resolution timer with low overhead. This is another reason why we bind the program to a single CPU using taskset, since the counters on various CPUs could differ.

Disclaimer: In our benchmark, not all functions have the exactly the same
behaviour, some just return that there is a difference, others return
the difference between the first non-matching bytes.

What are the results of a run when only two bytes are compared? Note: our source can be found here

Version	Equal	Unequal	Difference	Faster Version
glibc	001395356192	001394972053	000000384139	b
naiive	001450446420	001399492504	000050953916	b
blockbased	001477100272	001407210544	000069889728	b
builtin	001393116852	001393273220	000000156368	g
fastmemeq	001517470488	001447338372	000070132116	b
consttime	001803111820	001803071588	000000040232	b
timingsafe	001783485212	001783893148	000000407936	g
crypto	001524643216	001524825004	000000181788	g
openssl	001382549144	001382676276	000000127132	g
sse2	001579218680	001583962136	000004743456	g

How does this differ for 64 bytes?

Version	Equal	Unequal	Difference	Faster Version
glibc	001459868600	001449436793	000010431807	b
naiive	006893595988	001406745332	005486850656	b
blockbased	002008822664	001523502340	000485320324	b
builtin	001453912568	001443792076	000010120492	b
fastmemeq	002505249908	001411098452	001094151456	b
consttime	015470300784	015470952360	000000651576	g
timingsafe	013019553596	012996265528	000023288068	b
crypto	006730822580	006727159920	000003662660	b
openssl	002673272920	002692763332	000019490412	g
sse2	002487218616	001835804324	000651414292	b

What we can easily see, even for the naiive implementation, we do not get a time that is 32 times longer when comparing equal strings. This means there is a high calling/setup overhead.

The second view reveals, that for some implementations, the equal memory areas are compared faster than the differing ones. This is for example the case for the consttime and openssl implementation in the 64 Byte run. The other timing safe implementations flicker. If several runs are performed sometimes the equal and sometimes the unequal cases take longer, depending on context switches and other load happening on the box. Furthermore it is clearly visible, that it is possible to perform timing attacks against memcmp() if the implementation is not fully optimised but not on all systems.

In conclusion, it highly depends on the circumstances if a memcmp() is subject to a timing attack.

Update: We were asked by @PaulM, if we did disable hyperthreading. In our experiments, HT did not make a difference, we assume because we bound the process to a single CPU.

NEWS

Benchmarking memcmp() for timing attacks