CSC230 Homework 3

This homework is to be done individually.

Only use ANSI C99 standard library functions (excluding system()). Other libraries, like GNU libraries, etc., are not allowed.

Updates:

6/10/2015: Fix starter kit link and github URL
6/16/2015: A student spotted an unspecified aspect of the program. I did not describe the behavior if multiple filename arguments are provided. As such, we'll call the behavior "undefined", which generally means "no specification except that it shouldn't crash" -- this means you can use your best judgment as to the program's behavior in this case.

Learning Outcomes

Write small to medium C programs having several separately-compiled modules.
Explain what happens to a program during preprocessing, lexical analysis, parsing, code generation, code optimization, linking, and execution, and identify errors that occur during each phase. In particular, they will be able to describe the differences in this process between C and Java.
Correctly identify error messages and warnings from the preprocessor, compiler, and linker, and avoid them.
Find and eliminate runtime errors using a combination of logic, language understanding, trace printout, and gdb or a similar command-line debugger.
Interpret and explain data types, conversions between data types, and the possibility of overflow and underflow.
Explain, inspect, and implement programs using structures such as enumerated types, unions, and constants and arithmetic, logical, relational, assignment, and bitwise operators.
Trace and reason about variables and their scope in a single function, across multiple functions, and across multiple modules.
Allocate and deallocate memory in C programs while avoiding memory leaks and dangling pointers. In particular, they will be able to implement dynamic arrays and singly-linked lists using allocated memory.
Use the C preprocessor to control tracing of programs, compilation for different systems, and write simple macros.
Write, debug, and modify programs using library utilities, including, but not limited to assert, the math library, the string library, random number generation, variable number of parameters, standard I/O, and file I/O.
Use simple command-line tools to design, document, debug, and maintain their programs.
Use an automatic packaging tool, such as make or ant, to distribute and maintain software that has multiple compilation units.
Use a version control tools, such as subversion (svn) or git, to track changes and do parallel development of software.
Distinguish key elements of the syntax (what’s legal), semantics (what does it do), and pragmatics (how is it used) of a programming language.

Introduction

Developers and sysadmins often need to analyze the content of binary files (i.e. files that are not comprised solely of ASCII text). Directly displaying such files on screen is a bad idea, since many byte values refer to control codes rather than human-readable characters, and the user would not be able to identify byte offsets of data being shown. Therefore, users typically use a hexadecimal dump to show the content of a file. The standard UNIX tool hexdump is one such tool (manpage here).

For example, here is an example of hexdump being used to display a file called test_allbytes.dat, a 256-byte file which contains the characters 0 through 255. The -C option indicates that we want a traditional hex output, as this tool can output in various formats. The -v option turns off a feature that combines similar lines (don't worry about it). This output has been colorized for explanation purposes.

$ hexdump -C -v test_allbytes.dat 00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................| 00000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................| 00000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./| 00000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?| 00000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f |@ABCDEFGHIJKLMNO| 00000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_| 00000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno| 00000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.| 00000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f |................| 00000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f |................| 000000a0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af |................| 000000b0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf |................| 000000c0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf |................| 000000d0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df |................| 000000e0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef |................| 000000f0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff |................| 00000100

The format of this output is 16 bytes per line, with each line having:

The offset into the file, in hex, followed by two spaces. (Shown in blue above.)
The bytes themselves, in two-digit hex format. Note that there are two spaces between the 8th and 9th bytes on the line -- this helps the user count bytes. Two spaces follow the last byte. (Shown in red above.)
The bytes rendered as ASCII text (shown in green above). However, for unprintable characters (such as control codes), a period is displayed. (These special characters are shown with a a yellow highlight above.) The rendered bytes are contained within a leading and trailing pipe (e.g., '|') character. The trailing pipe is immediately followed by a new line character (e.g., '\n').
At the end of the file, an additional address line is printed to show the total size of the file, again in hex, followed by a new line character (e.g., '\n'). (Shown in purple above.)

NOTE: The last row of bytes may have fewer than 16 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.

NOTE: The actual output is not in color -- the color above is just for explanatory purposes.

The object of this homework is to develop a similar tool. Full requirements are described below.

Program Requirements

Program Input

This will be a traditional UNIX-style tool. This means that it may take a potential command line switch, then a filename¹. It will support two output modes:

Hexadecimal output, as described above. This output will be exactly equivalent to the output of hexdump -C -v.
Binary output, similar to hex, but showing four bytes per line, with binary instead of hex.

Formally, the usage message is:

"Usage:\n" " binary_dump [option] <file>\n" "\n" "Options:\n" " -b binary display\n" " -x canonical hex+ASCII display\n" "\n"

The options flags work as follows:

If the command is run without a filename argument, the above usage message is printed verbatim, and the program exits (status 1, EXIT_FAILURE). Note that this is the case even if some options are present.
$ ./binary_dump
Usage: ...
$ echo $?
1
$ ./binary_dump -b
Usage: ...
$ echo $?
1
If the program is run with one filename argument and no options, then the hex output mode is used on the given filename, then the program exits (status 0 (EXIT_SUCCESS) if no errors, 1 (EXIT_FAILURE) otherwise). $ ./binary_dump test.dat
(Hex output of test.dat)
$ echo $?
0
If the command entered is of the form "binary_dump -b test.dat", then the binary output mode will be output for test.dat, then the program exits (status 0 (EXIT_SUCCESS) if no errors, 1 (EXIT_FAILURE) otherwise). $ ./binary_dump -b test.dat
(Binary output of test.dat)
$ echo $?
0
If the command entered is of the form "binary_dump -x test.dat", then the hex output mode will be output for test.dat, then the program exits (status 0 (EXIT_SUCCESS) if no errors, 1 (EXIT_FAILURE) otherwise). $ ./binary_dump -x test.dat
(Hex output of test.dat)
$ echo $?
0
If an option flag other than those listed above is used, the program will print "binary_dump: unknown option -- %s\n", where %s is the format string for printing the option flag attempted. This will be followed by the usage message, then an exit (status 1, EXIT_FAILURE). $ ./binary_dump -q
binary_dump: unknown option -- q
Usage: ...
$ echo $?
1
Multiple flags are not supported -- if the user attempts multiple flags either of the compound form "binary_dump -bx ..." or the separate form "binary_dump -b -x ...", then the program will print "binary_dump: multiple options are unsupported\n", followed by the usage message, then an exit (status 1, EXIT_FAILURE). Note that this error supersedes the unknown option error. $ ./binary_dump -bx
binary_dump: multiple options are unsupported
Usage: ...
$ echo $?
1

If a file IO error is encountered (file not found, etc.), then the system will print the filename in question, followed by ": ", then the standard UNIX error message for the error. (See "File IO and error messages" below.)

¹ I'm simplifying the UNIX model a bit for this homework. A traditional UNIX tool usually takes zero or more filename arguments. If no file argument is given, stdin is used. If one or more files are given, they are processed in sequence. Note that this is NOT a requirement for this homework. For more information, see the Art of UNIX Programming, specifically the Basics of UNIX Philosophy and the filter design pattern.

Hexadecimal output format

Simply put, your output must be byte-for-byte identical to the output of "hexdump -C -v" as described above.

The only exception is for an empty file -- hexdump prints nothing, whereas your binary_dump should still print the "last" line of output: the size of the file in hex ("00000000").

Binary output format

Similar the hex output described above, except the hex portion is instead presented in binary, with 4 bytes per line:

$ binary_dump -b test_allbytes.dat 00000000 00000000 00000001 00000010 00000011 |....| 00000004 00000100 00000101 00000110 00000111 |....| 00000008 00001000 00001001 00001010 00001011 |....| 0000000c 00001100 00001101 00001110 00001111 |....| 00000010 00010000 00010001 00010010 00010011 |....| 00000014 00010100 00010101 00010110 00010111 |....| 00000018 00011000 00011001 00011010 00011011 |....| 0000001c 00011100 00011101 00011110 00011111 |....| 00000020 00100000 00100001 00100010 00100011 | !"#| 00000024 00100100 00100101 00100110 00100111 |$%&'| 00000028 00101000 00101001 00101010 00101011 |()*+| 0000002c 00101100 00101101 00101110 00101111 |,-./| 00000030 00110000 00110001 00110010 00110011 |0123| 00000034 00110100 00110101 00110110 00110111 |4567| 00000038 00111000 00111001 00111010 00111011 |89:;| 0000003c 00111100 00111101 00111110 00111111 |<=>?| 00000040 01000000 01000001 01000010 01000011 |@ABC| 00000044 01000100 01000101 01000110 01000111 |DEFG| 00000048 01001000 01001001 01001010 01001011 |HIJK| 0000004c 01001100 01001101 01001110 01001111 |LMNO| 00000050 01010000 01010001 01010010 01010011 |PQRS| 00000054 01010100 01010101 01010110 01010111 |TUVW| 00000058 01011000 01011001 01011010 01011011 |XYZ[| 0000005c 01011100 01011101 01011110 01011111 |\]^_| 00000060 01100000 01100001 01100010 01100011 |`abc| 00000064 01100100 01100101 01100110 01100111 |defg| 00000068 01101000 01101001 01101010 01101011 |hijk| 0000006c 01101100 01101101 01101110 01101111 |lmno| 00000070 01110000 01110001 01110010 01110011 |pqrs| 00000074 01110100 01110101 01110110 01110111 |tuvw| 00000078 01111000 01111001 01111010 01111011 |xyz{| 0000007c 01111100 01111101 01111110 01111111 ||}~.| 00000080 10000000 10000001 10000010 10000011 |....| 00000084 10000100 10000101 10000110 10000111 |....| 00000088 10001000 10001001 10001010 10001011 |....| 0000008c 10001100 10001101 10001110 10001111 |....| 00000090 10010000 10010001 10010010 10010011 |....| 00000094 10010100 10010101 10010110 10010111 |....| 00000098 10011000 10011001 10011010 10011011 |....| 0000009c 10011100 10011101 10011110 10011111 |....| 000000a0 10100000 10100001 10100010 10100011 |....| 000000a4 10100100 10100101 10100110 10100111 |....| 000000a8 10101000 10101001 10101010 10101011 |....| 000000ac 10101100 10101101 10101110 10101111 |....| 000000b0 10110000 10110001 10110010 10110011 |....| 000000b4 10110100 10110101 10110110 10110111 |....| 000000b8 10111000 10111001 10111010 10111011 |....| 000000bc 10111100 10111101 10111110 10111111 |....| 000000c0 11000000 11000001 11000010 11000011 |....| 000000c4 11000100 11000101 11000110 11000111 |....| 000000c8 11001000 11001001 11001010 11001011 |....| 000000cc 11001100 11001101 11001110 11001111 |....| 000000d0 11010000 11010001 11010010 11010011 |....| 000000d4 11010100 11010101 11010110 11010111 |....| 000000d8 11011000 11011001 11011010 11011011 |....| 000000dc 11011100 11011101 11011110 11011111 |....| 000000e0 11100000 11100001 11100010 11100011 |....| 000000e4 11100100 11100101 11100110 11100111 |....| 000000e8 11101000 11101001 11101010 11101011 |....| 000000ec 11101100 11101101 11101110 11101111 |....| 000000f0 11110000 11110001 11110010 11110011 |....| 000000f4 11110100 11110101 11110110 11110111 |....| 000000f8 11111000 11111001 11111010 11111011 |....| 000000fc 11111100 11111101 11111110 11111111 |....| 00000100

Each line has:

The offset into the file, in hex, followed by two spaces. (Shown in blue above.)
The bytes themselves, in eight-digit binary format. There's no extra space between columns as there was in hex -- just one space between each byte. Two spaces follow the last byte. (Shown in red above.)
The bytes rendered as ASCII text (shown in green above). However, for unprintable characters (such as control codes), a period is displayed. (These special characters are shown with a a yellow highlight above.) he rendered bytes are contained within a leading and trailing pipe (e.g., '|') character. The trailing pipe is immediately followed by a new line character (e.g., '\n').
At the end of the file, an additional address line is printed to show the total size of the file, again in hex, followed by a new line character (e.g., '\n'). (Shown in purple above.)

NOTE: The last row of bytes may have fewer than 4 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.

Design

No template is provided for this homework. However, it is recommended (not required) that you have functions similar to the following:

void char_to_binary(char buf[], unsigned char c) - Fills the given character buffer with the binary equivalent of the provided byte, terminating with the NULL (\0). The buffer must be at least 9 bytes long (8 bits plus a NULL).
void hex_dump(const char filename[]) - Outputs the hex dump of the given file as described above.
void binary_dump(const char filename[]) - Outputs the binary dump of the given file as described above.
void usage() - A function which prints the usage message, then exits with status 1 (EXIT_FAILURE).
int main(int argc, char* argv[]) - Entry point into the program, processes command line arguments as described above.

We have provided a number of test inputs and outputs -- these are described in the testing section below.

Implementation

Get your environment set up

Note: If you successfully cloned your GitHub repo for HW2, you can skip to the section "Get the starter files". Make sure that you're developing in the 3_homework directory!

Start by cloning the provided CSC230 GitHub repo in your local AFS (or development space) using the following commands:

$ setenv SSH_ASKPASS
$ git clone https://<unity_id>@github.ncsu.edu/engr-csc230-summer2015/<repo_name>.git

This will create a directory with your repo's name. If you cd into the directory, you should see directories for each of the homeworks for the class. You'll want to do all of your development in 3_homework. If you do NOT see the 3_homework directory, enter the following command to make the directory:

$ mkdir 3_homework

Get the starter files

Next, you will need to copy the hw3_starter.tgz from the course locker into your 3_homework directory. Use the following command:

$ wget

Untar using the command:

$ tar xzvf hw3_starter.tgz

Set up your `Makefile`

In addition, no Makefile is provided -- you will also need to write this. For automated grading to work, the Makefile must:

Be written such that when make is executed, your binary_dump.c code is compiled with the -Wall -std=c99 options to an executable called binary_dump.
The "clean" target must also be specified, such that it removes all compilation results (any executables and object files).
Compiling to an intermediate object file (*.o) is optional

Write your code: `binary_dump.c`

Your code should be in a single file called binary_dump.c. You must use the appropriate name for automated grading. .

File IO and error messages

Use the C standard IO methods for file reading (fopen, fread, fclose, etc.). Because we're doing binary IO, be sure to include the "b" flag in your fopen call.

If an IO error is encountered (e.g. file not found), the UNIX standard thing to do is to call perror(filename), then exit with status 1 (EXIT_FAILURE). This will print the filename in question, followed by ": ", then the standard UNIX error message. See the perror manpage. Example: $ binary_dump nonexistant.bin nonexistant.bin: No such file or directory $ echo $? 1

Identifying printable characters

You do NOT have to write your own function to identify if a character is printable. C has a function for that. In the ctype.h header file, there is a function called isprint() that returns true if a given character is printable. For more information, see the manpage for the ctype functions.

Printing hex output

To ensure that all leading 0s are included for any hex output, put a leading 0 in the format specifier. For example, to create hex output for 1a that has three leading 0s, the format specifier would be "%05x".

Creating binary numbers

Also, recall from lecture that there is no built-in way to format a number as binary in C. You will need to code a new routine to do that, and it's going to involve bitwise operations.

Testing

When you are ready to run the full suite of tests, execute:

$ chmod +x test.sh
$ ./test.sh

The first command, chmod, sets the permissions of test.sh to allow for execution. You only have to run this command once.

Pushing changes

When you are ready to commit to your local repository, execute:

$ git add .
$ git commit -am "a meaningful message for future you"

When you are ready to submit to the remote repository for automated grading or teaching staff feedback, execute:

$ git push

Testing

When compiling your program, you must use the make command.

The program output must match the provided output exactly for automated grading.

Note that the echo $? command will only print the exit value of immediate previous command. Make sure that you call the echo command immediately after executing your program to ensure that your exit code is correct.

Twelve test cases for the binary_dump program are provided. We have also provided a script, test.sh, that will execute all of the tests and report the results. There should be no difference between your actual output and the expected outputs. Grading will test additional inputs and outputs, so you should test your program with inputs and outputs beyond the provided example. The test cases and testing script (e.g., test.sh) are provided for you in hw3_starter.tgz. See the Implementation section for details about setting up your work environment and how to use hw3_starter.tgz.

test.sh: If your program does NOT pass the tests in test.sh then it will not be able to pass the teaching staff test cases. Use test.sh as a method to verify that at least the minimum requirements have been met for this assignment.

The details of the provided tests are:

Test ID	Description	Command line	Exit Code	Expected Output
Test 01	No args.	./binary_dump	1	test01_eout
Test 02	Hex dump: All bytes from 0 to 255 in one file. Default mode.	./binary_dump test_allbytes.dat	0	test02_eout
Test 03	Hex dump: All bytes from 0 to 255 in one file. Explicit -x mode.	./binary_dump -x test_allbytes.dat	0	test03_eout
Test 04	Binary dump: All bytes from 0 to 255 in one file. Explicit -b mode.	./binary_dump -b test_allbytes.dat	0	test04_eout
Test 05	Hex dump: empty file.	./binary_dump test_empty.dat	0	test05_eout
Test 06	Binary dump: empty file.	./binary_dump -b test_empty.dat	0	test06_eout
Test 07	Hex dump: File with 2766 (0xACE) random bytes.	./binary_dump -x test_0xACE_random.dat	0	test07_eout
Test 08	Binary dump: File with 2766 (0xACE) random bytes.	./binary_dump -b test_0xACE_random.dat	0	test08_eout
Test 09	Attempt to do a binary dump of a nonexistant file.	./binary_dump -b nonexistant.dat	1	test09_eout
Test 10	Invalid option.	./binary_dump -q test_allbytes.dat	1	test10_eout
Test 11	Separate multiple options (unsupported).	./binary_dump -b -x test_allbytes.dat	1	test11_eout
Test 12	Compound multiple options (unsupported).	./binary_dump -bx test_allbytes.dat	1	test12_eout

Checking Jenkins Feedback

We have created a Jenkins build job for you. Jenkins is a continuous integration server that is used in industry to compile, build, and test applications under development. We will be using Jenkins to compile, build, and test your homework submissions to GitHub, which will provide early feedback on the completeness (does your implementation meet the requirements) and quality (does your implementation correctly implement the requirements).

Your Jenkins job is associated with your GitHub repository and will poll or query GitHub every two minutes for changes. After you have pushed code to GitHub, Jenkins will notice the change and automatically start a build process on your code. The following actions will occur:

Code will be pulled from your GitHub repository
The style checker vera++ will run on your code and report notifications that you should fix.
A magic number checker will run on your code and report notifications that you should fix.
A check for required files (Makefile, test.sh, and binary_dump.c)
test.sh will execute the following items using your Makefile (so make sure you push test.sh to GitHub!)
- Run make clean
- Compile your program into the executable binary_dump, using make
- Run the provided test cases on the binary_dump executable

Jenkins will record the results of each execution. To obtain your Jenkins feedback, do the following tasks:

Go to Jenkins for CSC230
Click the project named HW3-<unityid>
There will be a table called Build History in the lower left, click the link for the latest build
Click the Console Output link in the left menu (4th item)
The console output provides the feedback from static analysis (style and magic number checks), compiling your program, and executing the test cases

NOTE: Jenkins will NOT execute test.sh without evidence that you have run the tests locally. We require that at least one file matching the regular expression *aout is pushed to GitHub for automated testing to run.

Instructions for Submission

The following files MUST be pushed to your assigned CSC230 GitHub repository:

binary_dump.c
Makefile
test.sh
All test files, including actual results of execution

By submitting the actual results from the tests, you will prove that you have tested your program with the minimum set of the provided acceptance tests. Automated grading will not run on your program without at least one generated actual result file (it doesn't mean your tests have to pass, just that you've attempted the tests locally before pushing to your repo).

Additional Considerations

Follow the CSC 230 Style Guidelines. Make sure your program compiles on the common platform and Jenkins cleanly (no errors or warnings), with the required compile options.

There are certain learning outcomes and basic software engineering skills that this assignment is assessing. See the rubric below for deductions may be applied to your submission as enforcement of good software engineering practices and assignment intentions.

Make sure that you push your code to the GitHub repository provided for you! Pushing your code to GitHub is your submission!

There is a 24 hour window for late submissions. After the main deadline, continue to submit to GitHub. We will use the last commit to GitHub before the late deadline for grading and the timestamp of that commit will determine a deduction, if any.

Rubric

binary_dump.c

+20 for compiling on the common platform with gcc –Wall –std=c99 options

+70 for passing teaching staff test cases (all tests will be done using a diff as demonstrated above and checking the exit codes), which demonstrates a correct implementation.

+20 for comments, style, and constants

-5 for meaningless or missing comments

-5 for inconsistent formatting

-5 for magic numbers

-5 for no name in comments

Total: 110 points

Global deductions FROM the score you earn above:

-15 points for any compilation warnings. Your program must compile cleanly! (if it doesn't compile, you will not receive any credit for test cases or compilation)

-60 points for using library functions outside of ANSI C99 or for using system() -- This means that you may have circumvented the intention of the assignment and receive no points for a correct implementation.

-10 points for files (binary_dump.c, Makefile, test.sh, and all of the actual outputs generated through testing (test*_aout)) that are named incorrectly or missing.

-20 points for late work, even if you submit part of the assignment on time.

You can not receive less than a 0 for this assignment unless an academic integrity violation occurs.

CSC230 Homework 3

Introduction

Program Requirements

Program Input

Hexadecimal output format

Binary output format

Design

Implementation

Get your environment set up

Get the starter files

Set up your Makefile

Write your code: binary_dump.c

File IO and error messages

Identifying printable characters

Printing hex output

Creating binary numbers

Testing

Pushing changes

Testing

Checking Jenkins Feedback

Instructions for Submission

Additional Considerations

Rubric

Set up your `Makefile`

Write your code: `binary_dump.c`