from __future__ import division
import os
import sys
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
%precision 4
plt.style.use('ggplot')
from IPython.display import Image
C functions are typically split into header files (.h
) where things
are declared but not defined, and implementation files (.c
) where
they are defined. When we run the C compiler, a complex sequence of
events is triggered with the usual successful outcome begin an
executable file as illuatrated at http://www.codingunit.com/
The preprocessor merges the contents of the header and implementation
files, and also expands any macros. The compiler then translates these
into low level object code (.o
) for each file, and the linker then
joins together the newly generated object code with pre-compiled object
code from libraries to form an executable. Sometimes we just want to
generate object code and save it as a library (e.g. so that we can use
it in Python).
%%file hello.c
#include <stdio.h>
int main() {
printf("Hello, world!");
}
! gcc hello.c -o hello
! ./hello
%%file fib.h
double fib(int n);
%%file fib.c
double fib(int n) {
double a = 0, b = 1;
for (int i=0; i<n; i++) {
double tmp = b;
b = a;
a += tmp;
}
return a;
}
%%file main.c
#include <stdio.h> // for printf()
#include <stdlib.h> // for atoi())
#include "fib.h" // for fib()
int main(int argc, char* argv[]) {
int n = atoi(argv[1]);
printf("%f", fib(n));
}
%%file Makefile
CC=clang
CFLAGS=-Wall
fib: main.o fib.o
$(CC) $(CFLAGS) -o fib main.o fib.o
main.o: main.c fib.h
$(CC) $(CFAGS) -c main.c
fib.o: fib.c
$(CC) $(CFLAGS) -c fib.c
clean:
rm -f *.o
! make
! ./fib 100
The basic types are very simple - use int, float and double for numbers. In genneral, avoid float for plain C code as its lack of precision may bite you unless you are writing CUDA code. Strings are quite nasty to use in C - I would suggest doing all your string processing in Python ...
Structs are sort of like classes in Python
struct point {
double x;
double y;
double z;
};
struct point p1 = {.x = 1, .y = 2, .z = 3};
struct point p2 = {1, 2, 3};
struct point p3;
p3.x = 1;
p3.y = 2;
p3.z = 3;
You can define your own types using typedef
-.e.g.
#include <stdio.h>
struct point {
double x;
double y;
double z;
};
typedef struct point point;
int main() {
point p = {1, 2, 3};
printf("%.2f, %.2f, %.2f", p.x, p.y, p.z);
};
Most of the operators in C are the same in Python, but an important difference is the increment/decrement operator. That is
int c = 10;
c++; // same as c = c + 1, i.e., c is now 11
c--; // same as c = c - 1, i.e.. c is now 10 again
There are two forms of the incremanet operator - postfix c++
and
prefix ++c
. Both increemnt the varible, but in an expressino, the
postfix veersion returns the value before the increment and the prefix
returns the value after the increment.
%%file increment.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x = 3, y;
y = x++; // x is incremented and y takes the value of x before incrementation
printf("x = %d, y = %d\n", x, y);
y = ++x; // x is incremented and y takes the value of x after incrementation
printf("x = %d, y = %d\n", x, y);
}
%%bash
clang -Wall increment.c -o increment
./increment
The ternary operator expr = condition ? expr1 : expr2
allows an
if-else statement to be put in a single line. In English, this says that
if condition is True, expr1 is assigned to expr, otherwise expr2 is
assigned to expr. We used it in the tutorial code to print a comma
between elements in a list unless the elememnt was the last one, in
which case we printed a new line ‘’.
Note: There is a similar ternary construct in Python
expr = expr1 if condition else epxr2
.
Very similar to Python or R. The examples below should be self-explanatory.
// Interpretation of grades by Asian parent
if (grade == 'A') {
printf("Acceptable\n");
} else if (grade == 'B') {
printf("Bad\n");
} else if (grade == 'C') {
printf("Catastrophe\n");
} else if (grade == 'D') {
printf("Disowned\n");
} else {
printf("Missing child report filed with local police\n")
}
// Looping variants
// the for loop in C consists of the keyword for followed by
// (initializing statement; loop condition statement; loop update statement)
// followed by the body of the loop in curly braces
int arr[3] = {1, 2, 3};
for (int i=0; i<sizeof(arr)/sizeof(arr[0]); i++) {
printf("%d\n", i);
}
// the while loop
int i = 3;
while (i > 0) {
i--;
}
// the do loop is similar to the while loop but will execute the body at least once
int i = 3;
do {
i==;
} while (i > 0);
The C standard does not require braces if the body is a singel line, but I think it is safer to always include them. Note that whitespace is not significant in C (unlike Python), so
int i = 10;
while (i > 0)
i--;
i++;
actually means
int i = 10;
while (i > 0) {
i--;
}
i++;
and the use of braces even for single statement bodies prevnets such errors.
If you know the size of the arrays at initialization (i.e. when the program is first run), you can usually get away with the use of fixed size arrays for which C will automatically manage memory for you.
int len = 3;
// Giving an explicit size
double xs[len];
for (int i=0; i<len; i++) {
xs[i] = 0.0;
}
// C can infer size if initializer is given
double ys[] = {1, 2, 3};
Otherwise, we have to manage memory ourselves using pointers. Bascially, memory in C can be auotmatic, static or dynamic. Variables in automatic memory are managed by the computer on the stack, when it goes out of scope, the varible disappears. Static variables essentially live forever. Dynamic memory is allocated in the heap, and you manage its lifetime.
Mini-glossary: * scope: Where a variable is visible - basically C variables have block scope - variables either live within a pair of curly braces (inlucdes variables in parentheses just before block such as function arguments and the counter in a for loop), or they are visible thorughout the file. * stack: Computer memory is divided into a stack (small) and a heap (big). Automatic varianbles are put on the stack; dynamcic variables are put in the heap. Hence if you have a very large array, you would use dynamic memory allocation even if you knwe its size at initialization.
Any variable in memory has an address represented as a 64-bit integer in
most operating systems. A pointer is basically an integer containing the
address of a block of memory. This is what is returned by functions such
as malloc
. In C, a pointer is dentoed by *
. However, the *
notation is confusing because its interpreation depends on whehter you
are using it in a declaraiton or not. In a declaration
int *p = malloc(sizeof(int)); // p is a pointer to an integer
*p = 5; // *p is an integer
To get the actual address value, we can use the &
address opertor.
This is often used so that a function can alter the value of an argument
passed in (e.g. see address.c below).
%%file pointers.c
#include <stdio.h>
int main()
{
int i = 2;
int j = 3;
int *p;
int *q;
*p = i;
q = &j;
printf("p = %p\n", p);
printf("*p = %d\n", *p);
printf("&p = %p\n", &p);
printf("q = %p\n", q);
printf("*q = %d\n", *q);
printf("&q = %p\n", &q);
}
%%bash
clang -Wall -Wno-uninitialized pointers.c -o pointers
./pointers
%%file by_val.c
#include <stdio.h>
void change_arg(int p) {
p *= 2;
}
int main()
{
int x = 5;
change_arg(x);
printf("%d\n", x);
}
%%bash
clang -Wall by_val.c -o by_val
./by_val
%%file by_ref.c
#include <stdio.h>
void change_arg(int *p) {
*p *= 2;
}
int main()
{
int x = 5;
change_arg(&x);
printf("%d\n", x);
}
%%bash
clang -Wall by_ref.c -o by_ref
./by_ref
%%file ptr.c
#include <stdio.h>
int main() {
int x = 2;
int *p = &x;
int **q = &p;
int ***r = &q;
printf("%d, %p, %p, %p, %p, %p, %p, %d", x, &x, p, &p, q, &q, r, ***r);
}
%%bash
gcc ptr.c -o ptr
./ptr
If we want to store a whole sequence of ints, we can do so by simply allocating more memory:
int *ps = malloc(5 * sizeof(int)); // ps is a pointer to an integer
for (int i=0; i<5; i++) {
ps[i] = i;
}
The computer will find enough space in the heap to store 5 consecutive
integers in a contiguour way. Since C arrays are all fo the same
type, this allows us to do pointer arithmetic - i.e. the pointer
ps
is the same as &ps[0]
and ps + 2
is the same as
&ps[2]
. An example at this point is helpful.
%%file pointers2.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *ps = malloc(5 * sizeof(int));
for (int i =0; i < 5; i++) {
ps[i] = i + 10;
}
printf("%d, %d\n", *ps, ps[0]); // remmeber that *ptr is just a regular variable outside of a declaration, in this case, an int
printf("%d, %d\n", *(ps+2), ps[2]);
printf("%d, %d\n", *(ps+4), *(&ps[4])); // * and & are inverses
free(ps); // avoid memory leak
}
%%bash
clang -Wall pointers2.c -o pointers2
./pointers2
An array name is actualy just a constant pointer to the address of the beginning of the array. Hence, we can derferecne an array name just like a pointer. We can also do pointer arithmetic with array names - this leads to the following legal but weird syntax:
arr[i] = *(arr + i) = i[arr]
%%file array_pointer.c
#include <stdio.h>
int main()
{
int arr[] = {1, 2, 3};
printf("%d\t%d\t%d\t%d\t%d\t%d\n", *arr, arr[0], 0[arr], *(arr + 2), arr[2], 2[arr]);
}
%%bash
clang -Wall array_pointer.c -o array_pointer
./array_pointer
%%file array_2d.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int r = 3, c = 4;
// first allocate space for the pointers to all rows
int **arr = malloc(r * sizeof(int *));
// then allocate space for the number of columns in each row
for (int i=0; i<r; i++) {
arr[i] = malloc(c * sizeof(int));
}
// fill array with integer values
for (int i = 0; i < r; i++) {
for (int j = 0; j < c; j++) {
arr[i][j] =i*r+j;
}
}
for (int i = 0; i < r; i++) {
for (int j = 0; j < c; j++) {
printf("%d ", arr[i][j]);
}
}
// every malloc should have a free to avoid memory leaks
for (int i=0; i<r; i++) {
free(arr[i]);
}
free(arr);
}
%%bash
gcc -Wall array_2d.c -o array_2d
./array_2d
Differnt kinds of nothing: There is a special null pointer indicated
by the keyword NULL that points to nothing. It is typically used for
pointer comparisons, since NULL pointers are guaranteed to compare as
not equal to any other pointer (including another NULL). In paticular,
it is often used as a sentinel value to mark the end of a list. In
contrast a void pointer (void *) points to a memory location whose type
is not decalred. It is used in C for generic operations - for example,
malloc
returns a void pointer. To totally confuse the beginning C
student, there is also the NUL keyword, which refers to the '\0'
character used to terminate C strings. NUL and NULL are totally
differnet beasts.
Deciphering pointer idioms: A common C idiom that you should get
used to is *q++ = *p++
where p and q are both pointers. In English,
this says
%%file pointers3.c
#include <stdio.h>
#include <stdlib.h>
int main()
{
// example 1
typedef char* string;
char *s[] = {"mary ", "had ", "a ", "little ", "lamb", NULL};
for (char **sp = s; *sp != NULL; sp++) {
printf("%s", *sp);
}
printf("\n");
// example 2
char *src = "abcde";
char *dest = malloc(5); // char is always 1 byte by C99 definition
char *p = src + 4;
char *q = dest;
while ((*q++ = *p--)); // put the string in src into dest in reverse order
for (int i = 0; i < 5; i++) {
printf("i = %d, src[i] = %c, dest[i] = %c\n", i, src[i], dest[i]);
}
}
%%bash
clang -Wall pointers3.c -o pointers3
./pointers3
%%file square.c
#include <stdio.h>
double square(double x)
{
return x * x;
}
int main()
{
double a = 3;
printf("%f\n", square(a));
}
%%bash
clang -Wall square.c -o square
./square
How to make a nice function pointer: Start with a regular function declaration func, for example, here func is a function that takes a pair of ints and returns an int
int func(int, int);
To turn it to a function pointer, just add a *
and wrap the funtion
name in parenthesis like so
int (*func)(int, int);
Now func
is a pointer to a funciton that takes a pair of ints and
returns an int. Finally, add a typedef so that we can use func
as a
new type
typedef int (*func)(int, int);
which allows us to create arrays of function pointers, higher order functions etc as shown in the following example.
%%file square2.c
#include <stdio.h>
#include <math.h>
// Create a function pointer type that takes a double and returns a double
typedef double (*func)(double x);
// A higher order function that takes just such a function pointer
double apply(func f, double x)
{
return f(x);
}
double square(double x)
{
return x * x;
}
double cube(double x)
{
return pow(x, 3);
}
int main()
{
double a = 3;
func fs[] = {square, cube, NULL};
for (func *f=fs; *f; f++) {
printf("%.1f\n", apply(*f, a));
}
}
%%bash
clang -Wall -lm square2.c -o square2
./square2
As you have seen, the processs of C program compilation can be quite
messy, with all sorts of different compiler and linker flags to specify,
libraries to add and so on. For this reason, most C programs are
compiled using the make
build tool that you are already familiar
with. Here is a simple generic makefile that you can customize to
compile your own programs adapted from the book 21st Centur C by Ben
Kelmens (O’Reilly Media).
In addition, there are traiditonal dummy flags * all: Builds all targets (for example, you may also have html and pdf targets that are optional) * clean: Remove intermediate and final products generated by the makefile
%%file makefile
TARGET =
OBJECTS =
CFLAGS = -g -Wall -O3
LDLIBS =
CC = c99
all: TARGET
clean:
rm $(TARGET) $(OBJECTS)
$(TARGET): $(OBJECTS)
Just fill in the blanks with whatever is appropriate for your program.
Here is a simple example where the main file test_main.c
uses a
function from stuff.c
with declarations in stuff.h
and also
depends on the libm C math library.
%%file stuff.h
#include <stdio.h>
#include <math.h>
void do_stuff();
%%file stuff.c
#include "stuff.h"
void do_stuff() {
printf("The square root of 2 is %.2f\n", sqrt(2));
}
%%file test_make.c
#include "stuff.h"
int main()
{
do_stuff();
}
%%file makefile
TARGET = test_make
OBJECTS = stuff.o
CFLAGS = -g -Wall -O3
LDLIBS = -lm
CC = clang
all: $(TARGET)
clean:
rm $(TARGET) $(OBJECTS)
$(TARGET): $(OBJECTS)
! make
! ./test_make
# Make is clever enough to recompile only what has been changed since the last time it was called
! make
! make clean
! make
Try to fix the following buggy program.
%%file buggy.c
# Create a function pointer type that takes a double and returns a double
double *func(double x);
# A higher order function that takes just such a function pointer
double apply(func f, double x)
{
return f(x);
}
double square(double x)
{
return x * x;
}
double cube(double x)
{
return pow(3, x);
}
double mystery(double x)
{
double y = 10;
if (x < 10)
x = square(x);
else
x += y;
x = cube(x);
return x;
}
int main()
{
double a = 3;
func fs[] = {square, cube, mystery, NULL}
for (func *f=fs, f != NULL, f++) {
printf("%d\n", apply(f, a));
}
}
! clang -g -Wall buggy.c -o buggy
What other language has an annual Obfuscated Code Contest http://www.ioccc.org/? In particular, the following features of C are very conducive to writing unreadable code:
array[index]
is the same as
*(array+index)
whihc is the same as index[array]
!Here is one winning entry from the 2013 IOCCC entry that should warm the heart of statisticians - it displays sparklines (invented by Tufte).
main(a,b)char**b;{int c=1,d=c,e=a-d;for(;e;e--)_(e)<_(c)?c=e:_(e)>_(d)?d=e:7;
while(++e<a)printf("\xe2\x96%c",129+(**b=8*(_(e)-_(c))/(_(d)-_(c))));}
%%file sparkl.c
main(a,b)char**b;{int c=1,d=c,e=a-d;for(;e;e--)_(e)<_(c)?c=e:_(e)>_(d)?d=e:7;
while(++e<a)printf("\xe2\x96%c",129+(**b=8*(_(e)-_(c))/(_(d)-_(c))));}
! gcc -Wno-implicit-int -include stdio.h -include stdlib.h -D'_(x)=strtof(b[x],0)' sparkl.c -o sparkl
import numpy as np
np.set_printoptions(linewidth=np.infty)
print ' '.join(map(str, (100*np.sin(np.linspace(0, 8*np.pi, 30))).astype('int')))
%%bash
./sparkl 0 76 98 51 -31 -92 -88 -21 60 99 68 -10 -82 -96 -41 41 96 82 10 -68 -99 -60 21 88 92 31 -51 -98 -76 0
If you have too much time on your hands and really want to know how not to write C code (unless you are crafting an entry for the IOCCC), I recommend this tutorial http://www.dreamincode.net/forums/topic/38102-obfuscated-code-a-simple-introduction/