Cheese as a Number

Cheese as a Number

In which we have fun with C’s type system.

Building Blocks

C is a basic building block of modern computer systems. Operating systems are written using C. Database software is written using C. Drivers are written in C.

What’s problematic about C as a language is the nearly total lack of type safety. (I like to joke that C has two types: byte and array of byte.) In effect, we can take bytes from any location in memory and treat them like any other thing we want as long as they’re the same length. In this example, we’re going to take a string (array of characters) and pretend it’s a 64-bit number. Our compiler isn’t even going to complain about this because everything we’re doing is above the board, according to C.

Say 111546296789091

#include <stdint.h>
#include <stdio.h>

int
main() {
  char c[8] = "cheese";
  uint64_t* chz = (uint64_t*)c;
  uint64_t result = *chz / 2;

  printf("chz/x:  0x%lx\n", *chz);
  printf("chz:    %lu\n", *chz);
  printf("result: %lu\n", result);
}

If I compile and execute this program, I get some exciting output:

rule$ clang cheese.c && ./a.out
chz/x:  0x657365656863
chz:    111546296789091
result: 55773148394545

wat?

I know, right? Here’s what’s going on in our stupid program:

  • We create an array of 8 single byte memory locations (you might know them as chars. The array is called c.
  • We cram the characters “cheese” into those 8 bytes of memory.
  • We create a pointer to a 64-bit unsigned number and we call it chz (because it’s a cheese-like memory location, OK?).
  • Being a big ol’ dummy head, we then say “You know what? When I stored ‘cheese’, I really wanted a big ass number. Let’s make a big ass number.”
  • And then we tell C that chz is a pointer to an unsigned 64-bit number that just happens to live at the same location as the variable c. And now we have a pointer to a big ass number.
  • Which we divide by 2 and store in result.

“Type System”

C doesn’t have much of a type system. It’s possible to take any arbitrary memory and tell the computer that the memory at some address is actually a completely different data type.

Some people see this as a flaw in C.

Frankly, it’s pretty reasonable to view this as a flaw. After all, we probably shouldn’t be taking random memory and cramming it into other memory locations like some kind of byte-goop. A more robust type system might perform some kind of checks on pointer types and ensure that pointers are actually looking at data with the same type as the pointer. This would prevent us from treating “cheese” as a large number.

On the other hand, sometimes you really do need to read an arbitrary amount of memory and assume that it’s some completely different data structure. And, for better or worse, this also mirrors the computer’s understanding of memory. For low-level programming, this is almost certainly what we want: an approximation of how computers manipulate memory.


Photo by Wyron A on Unsplash