CSC230 Homework 3

This homework is to be done individually.

Only use ANSI C99 standard library functions (excluding system()). Other libraries, like GNU libraries, etc., are not allowed.

Updates:

Learning Outcomes

Introduction

Developers and sysadmins often need to analyze the content of binary files (i.e. files that are not comprised solely of ASCII text). Directly displaying such files on screen is a bad idea, since many byte values refer to control codes rather than human-readable characters, and the user would not be able to identify byte offsets of data being shown. Therefore, users typically use a hexadecimal dump to show the content of a file. The standard UNIX tool hexdump is one such tool (manpage here).

For example, here is an example of hexdump being used to display a file called test_allbytes.dat, a 256-byte file which contains the characters 0 through 255. The -C option indicates that we want a traditional hex output, as this tool can output in various formats. The -v option turns off a feature that combines similar lines (don't worry about it). This output has been colorized for explanation purposes.

$ hexdump -C -v test_allbytes.dat 00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................| 00000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................| 00000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./| 00000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?| 00000040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f |@ABCDEFGHIJKLMNO| 00000050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_| 00000060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno| 00000070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.| 00000080 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f |................| 00000090 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f |................| 000000a0 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af |................| 000000b0 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf |................| 000000c0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf |................| 000000d0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df |................| 000000e0 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef |................| 000000f0 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff |................| 00000100

The format of this output is 16 bytes per line, with each line having:

NOTE: The last row of bytes may have fewer than 16 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.

NOTE: The actual output is not in color -- the color above is just for explanatory purposes.

The object of this homework is to develop a similar tool. Full requirements are described below.

Program Requirements

Program Input

This will be a traditional UNIX-style tool. This means that it may take a potential command line switch, then a filename1. It will support two output modes:

  1. Hexadecimal output, as described above. This output will be exactly equivalent to the output of hexdump -C -v.
  2. Binary output, similar to hex, but showing four bytes per line, with binary instead of hex.

Formally, the usage message is:

"Usage:\n" " binary_dump [option] <file>\n" "\n" "Options:\n" " -b binary display\n" " -x canonical hex+ASCII display\n" "\n"

The options flags work as follows:

If a file IO error is encountered (file not found, etc.), then the system will print the filename in question, followed by ": ", then the standard UNIX error message for the error. (See "File IO and error messages" below.)

1 I'm simplifying the UNIX model a bit for this homework. A traditional UNIX tool usually takes zero or more filename arguments. If no file argument is given, stdin is used. If one or more files are given, they are processed in sequence. Note that this is NOT a requirement for this homework. For more information, see the Art of UNIX Programming, specifically the Basics of UNIX Philosophy and the filter design pattern.

Hexadecimal output format

Simply put, your output must be byte-for-byte identical to the output of "hexdump -C -v" as described above.

The only exception is for an empty file -- hexdump prints nothing, whereas your binary_dump should still print the "last" line of output: the size of the file in hex ("00000000").

Binary output format

Similar the hex output described above, except the hex portion is instead presented in binary, with 4 bytes per line:

$ binary_dump -b test_allbytes.dat 00000000 00000000 00000001 00000010 00000011 |....| 00000004 00000100 00000101 00000110 00000111 |....| 00000008 00001000 00001001 00001010 00001011 |....| 0000000c 00001100 00001101 00001110 00001111 |....| 00000010 00010000 00010001 00010010 00010011 |....| 00000014 00010100 00010101 00010110 00010111 |....| 00000018 00011000 00011001 00011010 00011011 |....| 0000001c 00011100 00011101 00011110 00011111 |....| 00000020 00100000 00100001 00100010 00100011 | !"#| 00000024 00100100 00100101 00100110 00100111 |$%&'| 00000028 00101000 00101001 00101010 00101011 |()*+| 0000002c 00101100 00101101 00101110 00101111 |,-./| 00000030 00110000 00110001 00110010 00110011 |0123| 00000034 00110100 00110101 00110110 00110111 |4567| 00000038 00111000 00111001 00111010 00111011 |89:;| 0000003c 00111100 00111101 00111110 00111111 |<=>?| 00000040 01000000 01000001 01000010 01000011 |@ABC| 00000044 01000100 01000101 01000110 01000111 |DEFG| 00000048 01001000 01001001 01001010 01001011 |HIJK| 0000004c 01001100 01001101 01001110 01001111 |LMNO| 00000050 01010000 01010001 01010010 01010011 |PQRS| 00000054 01010100 01010101 01010110 01010111 |TUVW| 00000058 01011000 01011001 01011010 01011011 |XYZ[| 0000005c 01011100 01011101 01011110 01011111 |\]^_| 00000060 01100000 01100001 01100010 01100011 |`abc| 00000064 01100100 01100101 01100110 01100111 |defg| 00000068 01101000 01101001 01101010 01101011 |hijk| 0000006c 01101100 01101101 01101110 01101111 |lmno| 00000070 01110000 01110001 01110010 01110011 |pqrs| 00000074 01110100 01110101 01110110 01110111 |tuvw| 00000078 01111000 01111001 01111010 01111011 |xyz{| 0000007c 01111100 01111101 01111110 01111111 ||}~.| 00000080 10000000 10000001 10000010 10000011 |....| 00000084 10000100 10000101 10000110 10000111 |....| 00000088 10001000 10001001 10001010 10001011 |....| 0000008c 10001100 10001101 10001110 10001111 |....| 00000090 10010000 10010001 10010010 10010011 |....| 00000094 10010100 10010101 10010110 10010111 |....| 00000098 10011000 10011001 10011010 10011011 |....| 0000009c 10011100 10011101 10011110 10011111 |....| 000000a0 10100000 10100001 10100010 10100011 |....| 000000a4 10100100 10100101 10100110 10100111 |....| 000000a8 10101000 10101001 10101010 10101011 |....| 000000ac 10101100 10101101 10101110 10101111 |....| 000000b0 10110000 10110001 10110010 10110011 |....| 000000b4 10110100 10110101 10110110 10110111 |....| 000000b8 10111000 10111001 10111010 10111011 |....| 000000bc 10111100 10111101 10111110 10111111 |....| 000000c0 11000000 11000001 11000010 11000011 |....| 000000c4 11000100 11000101 11000110 11000111 |....| 000000c8 11001000 11001001 11001010 11001011 |....| 000000cc 11001100 11001101 11001110 11001111 |....| 000000d0 11010000 11010001 11010010 11010011 |....| 000000d4 11010100 11010101 11010110 11010111 |....| 000000d8 11011000 11011001 11011010 11011011 |....| 000000dc 11011100 11011101 11011110 11011111 |....| 000000e0 11100000 11100001 11100010 11100011 |....| 000000e4 11100100 11100101 11100110 11100111 |....| 000000e8 11101000 11101001 11101010 11101011 |....| 000000ec 11101100 11101101 11101110 11101111 |....| 000000f0 11110000 11110001 11110010 11110011 |....| 000000f4 11110100 11110101 11110110 11110111 |....| 000000f8 11111000 11111001 11111010 11111011 |....| 000000fc 11111100 11111101 11111110 11111111 |....| 00000100

Each line has:

NOTE: The last row of bytes may have fewer than 4 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.

Design

No template is provided for this homework. However, it is recommended (not required) that you have functions similar to the following: We have provided a number of test inputs and outputs -- these are described in the testing section below.

Implementation

Get your environment set up

Note: If you successfully cloned your GitHub repo for HW2, you can skip to the section "Get the starter files". Make sure that you're developing in the 3_homework directory!

Start by cloning the provided CSC230 GitHub repo in your local AFS (or development space) using the following commands:

$ setenv SSH_ASKPASS
$ git clone https://<unity_id>@github.ncsu.edu/engr-csc230-summer2015/<repo_name>.git

This will create a directory with your repo's name. If you cd into the directory, you should see directories for each of the homeworks for the class. You'll want to do all of your development in 3_homework. If you do NOT see the 3_homework directory, enter the following command to make the directory:

$ mkdir 3_homework

Get the starter files

Next, you will need to copy the hw3_starter.tgz from the course locker into your 3_homework directory. Use the following command:

$ wget

Untar using the command:

$ tar xzvf hw3_starter.tgz

Set up your Makefile

In addition, no Makefile is provided -- you will also need to write this. For automated grading to work, the Makefile must:

Write your code: binary_dump.c

Your code should be in a single file called binary_dump.c. You must use the appropriate name for automated grading. .

File IO and error messages

Use the C standard IO methods for file reading (fopen, fread, fclose, etc.). Because we're doing binary IO, be sure to include the "b" flag in your fopen call.

If an IO error is encountered (e.g. file not found), the UNIX standard thing to do is to call perror(filename), then exit with status 1 (EXIT_FAILURE). This will print the filename in question, followed by ": ", then the standard UNIX error message. See the perror manpage. Example: $ binary_dump nonexistant.bin nonexistant.bin: No such file or directory $ echo $? 1

Identifying printable characters

You do NOT have to write your own function to identify if a character is printable. C has a function for that. In the ctype.h header file, there is a function called isprint() that returns true if a given character is printable. For more information, see the manpage for the ctype functions.

Printing hex output

To ensure that all leading 0s are included for any hex output, put a leading 0 in the format specifier. For example, to create hex output for 1a that has three leading 0s, the format specifier would be "%05x".

Creating binary numbers

Also, recall from lecture that there is no built-in way to format a number as binary in C. You will need to code a new routine to do that, and it's going to involve bitwise operations.

Testing

When you are ready to run the full suite of tests, execute:

$ chmod +x test.sh
$ ./test.sh

The first command, chmod, sets the permissions of test.sh to allow for execution. You only have to run this command once.

Pushing changes

When you are ready to commit to your local repository, execute:

$ git add .
$ git commit -am "a meaningful message for future you"

When you are ready to submit to the remote repository for automated grading or teaching staff feedback, execute:

$ git push

Testing

When compiling your program, you must use the make command.

The program output must match the provided output exactly for automated grading. 

Note that the echo $? command will only print the exit value of immediate previous command. Make sure that you call the echo command immediately after executing your program to ensure that your exit code is correct.

Twelve test cases for the binary_dump program are provided. We have also provided a script, test.sh, that will execute all of the tests and report the results. There should be no difference between your actual output and the expected outputs.  Grading will test additional inputs and outputs, so you should test your program with inputs and outputs beyond the provided example. The test cases and testing script (e.g., test.sh) are provided for you in hw3_starter.tgz. See the Implementation section for details about setting up your work environment and how to use hw3_starter.tgz.

test.sh: If your program does NOT pass the tests in test.sh then it will not be able to pass the teaching staff test cases. Use test.sh as a method to verify that at least the minimum requirements have been met for this assignment.

The details of the provided tests are:

Test ID Description Command line Exit Code Expected Output
Test 01 No args. ./binary_dump 1 test01_eout
Test 02 Hex dump: All bytes from 0 to 255 in one file. Default mode. ./binary_dump test_allbytes.dat 0 test02_eout
Test 03 Hex dump: All bytes from 0 to 255 in one file. Explicit -x mode. ./binary_dump -x test_allbytes.dat 0 test03_eout
Test 04 Binary dump: All bytes from 0 to 255 in one file. Explicit -b mode. ./binary_dump -b test_allbytes.dat 0 test04_eout
Test 05 Hex dump: empty file. ./binary_dump test_empty.dat 0 test05_eout
Test 06 Binary dump: empty file. ./binary_dump -b test_empty.dat 0 test06_eout
Test 07 Hex dump: File with 2766 (0xACE) random bytes. ./binary_dump -x test_0xACE_random.dat 0 test07_eout
Test 08 Binary dump: File with 2766 (0xACE) random bytes. ./binary_dump -b test_0xACE_random.dat 0 test08_eout
Test 09 Attempt to do a binary dump of a nonexistant file. ./binary_dump -b nonexistant.dat 1 test09_eout
Test 10 Invalid option. ./binary_dump -q test_allbytes.dat 1 test10_eout
Test 11 Separate multiple options (unsupported). ./binary_dump -b -x test_allbytes.dat 1 test11_eout
Test 12 Compound multiple options (unsupported). ./binary_dump -bx test_allbytes.dat 1 test12_eout

Checking Jenkins Feedback

We have created a Jenkins build job for you. Jenkins is a continuous integration server that is used in industry to compile, build, and test applications under development. We will be using Jenkins to compile, build, and test your homework submissions to GitHub, which will provide early feedback on the completeness (does your implementation meet the requirements) and quality (does your implementation correctly implement the requirements).

Your Jenkins job is associated with your GitHub repository and will poll or query GitHub every two minutes for changes. After you have pushed code to GitHub, Jenkins will notice the change and automatically start a build process on your code. The following actions will occur:

Jenkins will record the results of each execution. To obtain your Jenkins feedback, do the following tasks:

NOTE: Jenkins will NOT execute test.sh without evidence that you have run the tests locally. We require that at least one file matching the regular expression *aout is pushed to GitHub for automated testing to run.

Instructions for Submission

The following files MUST be pushed to your assigned CSC230 GitHub repository:

By submitting the actual results from the tests, you will prove that you have tested your program with the minimum set of the provided acceptance tests. Automated grading will not run on your program without at least one generated actual result file (it doesn't mean your tests have to pass, just that you've attempted the tests locally before pushing to your repo).

Additional Considerations

Follow the CSC 230 Style Guidelines.  Make sure your program compiles on the common platform and Jenkins cleanly (no errors or warnings), with the required compile options.

There are certain learning outcomes and basic software engineering skills that this assignment is assessing. See the rubric below for deductions may be applied to your submission as enforcement of good software engineering practices and assignment intentions.

Make sure that you push your code to the GitHub repository provided for you! Pushing your code to GitHub is your submission!

There is a 24 hour window for late submissions.  After the main deadline, continue to submit to GitHub. We will use the last commit to GitHub before the late deadline for grading and the timestamp of that commit will determine a deduction, if any.

Rubric

binary_dump.c

+20 for compiling on the common platform with gcc –Wall –std=c99 options

+70 for passing teaching staff test cases (all tests will be done using a diff as demonstrated above and checking the exit codes), which demonstrates a correct implementation.

+20 for comments, style, and constants

-5 for meaningless or missing comments

-5 for inconsistent formatting

-5 for magic numbers

-5 for no name in comments

Total: 110 points

Global deductions FROM the score you earn above:

-15 points for any compilation warnings. Your program must compile cleanly! (if it doesn't compile, you will not receive any credit for test cases or compilation)

-60 points for using library functions outside of ANSI C99 or for using system() -- This means that you may have circumvented the intention of the assignment and receive no points for a correct implementation.

-10 points for files (binary_dump.c, Makefile, test.sh, and all of the actual outputs generated through testing (test*_aout)) that are named incorrectly or missing.

-20 points for late work, even if you submit part of the assignment on time.

You can not receive less than a 0 for this assignment unless an academic integrity violation occurs.