CSC230 Homework 3
This homework is to be done individually.
Only use ANSI C99 standard library functions (excluding system()
). Other libraries, like GNU libraries, etc., are not allowed.
- 6/10/2015: Fix starter kit link and github URL
- 6/16/2015: A student spotted an unspecified aspect of the program. I did not describe the behavior if multiple filename arguments are provided. As such, we'll call the behavior "undefined", which generally means "no specification except that it shouldn't crash" -- this means you can use your best judgment as to the program's behavior in this case.
Learning Outcomes
- Write small to medium C programs having several separately-compiled modules.
- Explain what happens to a program during preprocessing, lexical analysis, parsing, code generation, code optimization, linking, and execution, and identify errors that occur during each phase. In particular, they will be able to describe the differences in this process between C and Java.
- Correctly identify error messages and warnings from the preprocessor, compiler, and linker, and avoid them.
- Find and eliminate runtime errors using a combination of logic, language understanding, trace printout, and gdb or a similar command-line debugger.
- Interpret and explain data types, conversions between data types, and the possibility of overflow and underflow.
- Explain, inspect, and implement programs using structures such as enumerated types, unions, and constants and arithmetic, logical, relational, assignment, and bitwise operators.
- Trace and reason about variables and their scope in a single function, across multiple functions, and across multiple modules.
- Allocate and deallocate memory in C programs while avoiding memory leaks and dangling pointers. In particular, they will be able to implement dynamic arrays and singly-linked lists using allocated memory.
- Use the C preprocessor to control tracing of programs, compilation for different systems, and write simple macros.
- Write, debug, and modify programs using library utilities, including, but not limited to assert, the math library, the string library, random number generation, variable number of parameters, standard I/O, and file I/O.
- Use simple command-line tools to design, document, debug, and maintain their programs.
- Use an automatic packaging tool, such as make or ant, to distribute and maintain software that has multiple compilation units.
- Use a version control tools, such as subversion (svn) or git, to track changes and do parallel development of software.
- Distinguish key elements of the syntax (what’s legal), semantics (what does it do), and pragmatics (how is it used) of a programming language.
Introduction
Developers and sysadmins often need to analyze the content of binary files (i.e. files that are not comprised solely of ASCII text).
Directly displaying such files on screen is a bad idea, since many byte values refer to control codes rather than human-readable characters,
and the user would not be able to identify byte offsets of data being shown. Therefore, users typically use a hexadecimal dump to show the content of a file.
The standard UNIX tool hexdump
is one such tool (manpage here).
For example, here is an example of hexdump
being used to display a file called test_allbytes.dat
,
a 256-byte file which contains the characters 0 through 255. The -C
option indicates that we want a traditional hex output, as this tool can output in various formats.
The -v
option turns off a feature that combines similar lines (don't worry about it).
This output has been colorized for explanation purposes.
The format of this output is 16 bytes per line, with each line having:
- The offset into the file, in hex, followed by two spaces. (Shown in
blue above.) - The bytes themselves, in two-digit hex format. Note that there are two spaces between the 8th and 9th bytes on the line -- this helps the user count bytes. Two spaces follow the last byte. (Shown in
red above.) - The bytes rendered as ASCII text (shown in
green above). However, for unprintable characters (such as control codes), a period is displayed. (These special characters are shown with aa yellow highlight above.) The rendered bytes are contained within a leading and trailing pipe (e.g., '|') character. The trailing pipe is immediately followed by a new line character (e.g., '\n'). - At the end of the file, an additional address line is printed to show the total size of the file, again in hex, followed by a new line character (e.g., '\n'). (Shown in
purple above.)
NOTE: The last row of bytes may have fewer than 16 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.
NOTE: The actual output is not in color -- the color above is just for explanatory purposes.
The object of this homework is to develop a similar tool. Full requirements are described below.Program Requirements
Program Input
This will be a traditional UNIX-style tool. This means that it may take a potential command line switch, then a filename1. It will support two output modes:
- Hexadecimal output, as described above. This output will be exactly equivalent to the output of
hexdump -C -v
. - Binary output, similar to hex, but showing four bytes per line, with binary instead of hex.
Formally, the usage message is:
The options flags work as follows:
- If the command is run without a filename argument, the above usage message is printed verbatim, and the program exits (status 1,
EXIT_FAILURE
). Note that this is the case even if some options are present.
$ ./binary_dump
Usage:...
$echo $?
1
$./binary_dump -b
Usage:...
$echo $?
1 - If the program is run with one filename argument and no options, then the hex output mode is used on the given filename, then the program exits (status 0 (
EXIT_SUCCESS
) if no errors, 1 (EXIT_FAILURE
) otherwise).$ ./binary_dump test.dat
(Hex output of test.dat)
$echo $?
0 - If the command entered is of the form "
binary_dump -b test.dat
", then the binary output mode will be output for test.dat, then the program exits (status 0 (EXIT_SUCCESS
) if no errors, 1 (EXIT_FAILURE
) otherwise).$ ./binary_dump -b test.dat
(Binary output of test.dat)
$echo $?
0 - If the command entered is of the form "
binary_dump -x test.dat
", then the hex output mode will be output for test.dat, then the program exits (status 0 (EXIT_SUCCESS
) if no errors, 1 (EXIT_FAILURE
) otherwise).$ ./binary_dump -x test.dat
(Hex output of test.dat)
$echo $?
0 - If an option flag other than those listed above is used, the program will print "
binary_dump: unknown option --
", where%s \n
is the format string for printing the option flag attempted. This will be followed by the usage message, then an exit (status 1,%s EXIT_FAILURE
).$ ./binary_dump -q
binary_dump: unknown option -- q
Usage:...
$echo $?
1 - Multiple flags are not supported -- if the user attempts multiple flags either of the compound form "
binary_dump -bx ...
" or the separate form "binary_dump -b -x ...
", then the program will print "binary_dump: multiple options are unsupported
\n
", followed by the usage message, then an exit (status 1,EXIT_FAILURE
). Note that this error supersedes the unknown option error.$ ./binary_dump -bx
binary_dump: multiple options are unsupported
Usage:...
$echo $?
1
If a file IO error is encountered (file not found, etc.), then the system will print the filename in question, followed by ": "
, then the standard UNIX error message for the error. (See "File IO and error messages" below.)
1 I'm simplifying the UNIX model a bit for this homework. A traditional UNIX tool usually takes zero or more filename arguments. If no file argument is given, stdin is used. If one or more files are given, they are processed in sequence. Note that this is NOT a requirement for this homework. For more information, see the Art of UNIX Programming, specifically the Basics of UNIX Philosophy and the filter design pattern.
Hexadecimal output format
Simply put, your output must be byte-for-byte identical to the output of "hexdump -C -v
" as described above.
The only exception is for an empty file -- hexdump
prints nothing, whereas your binary_dump
should still print the "last" line of output: the size of the file in hex ("00000000").
Binary output format
Similar the hex output described above, except the hex portion is instead presented in binary, with 4 bytes per line:
Each line has:
- The offset into the file, in hex, followed by two spaces. (Shown in
blue above.) - The bytes themselves, in eight-digit binary format. There's no extra space between columns as there was in hex -- just one space between each byte. Two spaces follow the last byte. (Shown in
red above.) - The bytes rendered as ASCII text (shown in
green above). However, for unprintable characters (such as control codes), a period is displayed. (These special characters are shown with aa yellow highlight above.) he rendered bytes are contained within a leading and trailing pipe (e.g., '|') character. The trailing pipe is immediately followed by a new line character (e.g., '\n'). - At the end of the file, an additional address line is printed to show the total size of the file, again in hex, followed by a new line character (e.g., '\n'). (Shown in
purple above.)
NOTE: The last row of bytes may have fewer than 4 bytes that can be printed. If so, the bytes portion of the output is filled with spaces so that the leading pipe for the ASCII text lines up with the other leading pipes. The trailing pipe for the ASCII text will follow the last ASCII value, which means the trailing pipe will not be aligned with the other trailing pipes.
Design
No template is provided for this homework. However, it is recommended (not required) that you have functions similar to the following:void char_to_binary(char buf[], unsigned char c)
- Fills the given character buffer with the binary equivalent of the provided byte, terminating with the NULL (\0
). The buffer must be at least 9 bytes long (8 bits plus a NULL).void hex_dump(const char filename[])
- Outputs the hex dump of the given file as described above.void binary_dump(const char filename[])
- Outputs the binary dump of the given file as described above.void usage()
- A function which prints the usage message, then exits with status 1 (EXIT_FAILURE
).int main(int argc, char* argv[])
- Entry point into the program, processes command line arguments as described above.
Implementation
Get your environment set up
Note: If you successfully cloned your GitHub repo for HW2, you can skip to the section "Get the starter files". Make sure that you're developing in the 3_homework
directory!
Start by cloning the provided CSC230 GitHub repo in your local AFS (or development space) using the following commands:
$
This will create a directory with your repo's name. If you cd
into the directory, you should see directories for each of the homeworks for the class. You'll want to do all of your development in 3_homework
. If you do NOT see the 3_homework
directory, enter the following command to make the directory:
Get the starter files
Next, you will need to copy the hw3_starter.tgz
from the course locker into your 3_homework
directory. Use the following command:
Untar using the command:
Set up your Makefile
In addition, no Makefile
is provided -- you will also need to write this. For automated grading to work, the Makefile
must:
- Be written such that when
make
is executed, yourbinary_dump.c
code is compiled with the-Wall -std=c99
options to an executable calledbinary_dump
. - The "clean" target must also be specified, such that it removes all compilation results (any executables and object files).
- Compiling to an intermediate object file (*.o) is optional
Write your code: binary_dump.c
Your code should be in a single file called binary_dump.c
. You must use the appropriate name for automated grading.
.
File IO and error messages
Use the C standard IO methods for file reading (fopen
, fread
, fclose
, etc.). Because we're doing binary IO, be sure to include the "b
" flag in your fopen
call.
If an IO error is encountered (e.g. file not found), the UNIX standard thing to do is to call perror(filename)
, then exit with status 1 (EXIT_FAILURE
). This will print the filename in question, followed by ":
", then the standard UNIX error message.
See the perror manpage. Example:
Identifying printable characters
You do NOT have to write your own function to identify if a character is printable. C has a function for that. In the ctype.h
header file, there is a function called isprint()
that returns true if a given character is printable. For more information, see the manpage for the ctype functions.
Printing hex output
To ensure that all leading 0s are included for any hex output, put a leading 0 in the format specifier. For example, to create hex output for 1a
that has three leading 0s, the format specifier would be "%05x"
.
Creating binary numbers
Also, recall from lecture that there is no built-in way to format a number as binary in C. You will need to code a new routine to do that, and it's going to involve bitwise operations.
Testing
When you are ready to run the full suite of tests, execute:
$
The first command, chmod
, sets the permissions of test.sh
to allow for execution. You only have to run this command once.
Pushing changes
When you are ready to commit to your local repository, execute:
$
When you are ready to submit to the remote repository for automated grading or teaching staff feedback, execute:
Testing
When compiling your program, you must use the make
command.
The program output must match the provided output exactly for automated grading.
Note that the echo $?
command will only print the exit value of immediate previous command. Make sure that you call the echo
command immediately after executing your program to ensure that your exit code is correct.
Twelve test cases for the binary_dump
program are provided.
We have also provided a script, test.sh
, that will execute all of the tests and report the results.
There should be no difference between your actual output and the expected outputs.
Grading will test additional inputs and outputs, so you should test your program with inputs and outputs beyond the provided example.
The test cases and testing script (e.g., test.sh) are provided for you in hw3_starter.tgz.
See the Implementation section for details about setting up your work environment and how to use hw3_starter.tgz.
test.sh: If your program does NOT pass the tests in test.sh then it will not be able to pass the teaching staff test cases. Use test.sh as a method to verify that at least the minimum requirements have been met for this assignment.
The details of the provided tests are:
Test ID | Description | Command line | Exit Code | Expected Output |
---|---|---|---|---|
Test 01 | No args. | ./binary_dump | 1 | test01_eout |
Test 02 | Hex dump: All bytes from 0 to 255 in one file. Default mode. | ./binary_dump test_allbytes.dat | 0 | test02_eout |
Test 03 | Hex dump: All bytes from 0 to 255 in one file. Explicit -x mode. | ./binary_dump -x test_allbytes.dat | 0 | test03_eout |
Test 04 | Binary dump: All bytes from 0 to 255 in one file. Explicit -b mode. | ./binary_dump -b test_allbytes.dat | 0 | test04_eout |
Test 05 | Hex dump: empty file. | ./binary_dump test_empty.dat | 0 | test05_eout |
Test 06 | Binary dump: empty file. | ./binary_dump -b test_empty.dat | 0 | test06_eout |
Test 07 | Hex dump: File with 2766 (0xACE) random bytes. | ./binary_dump -x test_0xACE_random.dat | 0 | test07_eout |
Test 08 | Binary dump: File with 2766 (0xACE) random bytes. | ./binary_dump -b test_0xACE_random.dat | 0 | test08_eout |
Test 09 | Attempt to do a binary dump of a nonexistant file. | ./binary_dump -b nonexistant.dat | 1 | test09_eout |
Test 10 | Invalid option. | ./binary_dump -q test_allbytes.dat | 1 | test10_eout |
Test 11 | Separate multiple options (unsupported). | ./binary_dump -b -x test_allbytes.dat | 1 | test11_eout |
Test 12 | Compound multiple options (unsupported). | ./binary_dump -bx test_allbytes.dat | 1 | test12_eout |
Checking Jenkins Feedback
We have created a Jenkins build job for you. Jenkins is a continuous integration server that is used in industry to compile, build, and test applications under development. We will be using Jenkins to compile, build, and test your homework submissions to GitHub, which will provide early feedback on the completeness (does your implementation meet the requirements) and quality (does your implementation correctly implement the requirements).
Your Jenkins job is associated with your GitHub repository and will poll or query GitHub every two minutes for changes. After you have pushed code to GitHub, Jenkins will notice the change and automatically start a build process on your code. The following actions will occur:
- Code will be pulled from your GitHub repository
- The style checker vera++ will run on your code and report notifications that you should fix.
- A magic number checker will run on your code and report notifications that you should fix.
- A check for required files (
Makefile
,test.sh
, andbinary_dump.c
) - test.sh will execute the following items using your Makefile (so make sure you push test.sh to GitHub!)
- Run
make clean
- Compile your program into the executable
binary_dump
, usingmake
- Run the provided test cases on the
binary_dump
executable
- Run
Jenkins will record the results of each execution. To obtain your Jenkins feedback, do the following tasks:
- Go to Jenkins for CSC230
- Click the project named HW3-<unityid>
- There will be a table called Build History in the lower left, click the link for the latest build
- Click the Console Output link in the left menu (4th item)
- The console output provides the feedback from static analysis (style and magic number checks), compiling your program, and executing the test cases
NOTE: Jenkins will NOT execute test.sh without evidence that you have run the tests locally. We require that at least one file matching the regular expression *aout
is pushed to GitHub for automated testing to run.
Instructions for Submission
The following files MUST be pushed to your assigned CSC230 GitHub repository:
- binary_dump.c
- Makefile
- test.sh
- All test files, including actual results of execution
By submitting the actual results from the tests, you will prove that you have tested your program with the minimum set of the provided acceptance tests. Automated grading will not run on your program without at least one generated actual result file (it doesn't mean your tests have to pass, just that you've attempted the tests locally before pushing to your repo).
Additional Considerations
Follow the CSC 230 Style Guidelines. Make sure your program compiles on the common platform and Jenkins cleanly (no errors or warnings), with the required compile options.
There are certain learning outcomes and basic software engineering skills that this assignment is assessing. See the rubric below for deductions may be applied to your submission as enforcement of good software engineering practices and assignment intentions.
Make sure that you push your code to the GitHub repository provided for you! Pushing your code to GitHub is your submission!
There is a 24 hour window for late submissions. After the main deadline, continue to submit to GitHub. We will use the last commit to GitHub before the late deadline for grading and the timestamp of that commit will determine a deduction, if any.
Rubric
binary_dump.c
+20 for compiling on the common platform with gcc –Wall –std=c99
options
+70 for passing teaching staff test cases (all tests will be done using a diff
as demonstrated above and checking the exit codes), which demonstrates a correct implementation.
+20 for comments, style, and constants
-5 for meaningless or missing comments
-5 for inconsistent formatting
-5 for magic numbers
-5 for no name in comments
Total: 110 points
Global deductions FROM the score you earn above:
-15 points for any compilation warnings. Your program must compile cleanly! (if it doesn't compile, you will not receive any credit for test cases or compilation)
-60 points for using library functions outside of ANSI C99 or for using
system()
-- This means that you may have circumvented the intention of the assignment and receive no points for a correct implementation.-10 points for files (
binary_dump.c
,Makefile
,test.sh
, and all of the actual outputs generated through testing(test*_aout)
) that are named incorrectly or missing.-20 points for late work, even if you submit part of the assignment on time.
You can not receive less than a 0 for this assignment unless an academic integrity violation occurs.