Select from the choices above.

Please send all questions & assignments to:
dsolarek@utnet.utoledo.edu

UNIX and Program Development

The process of developing a program1 to satisify a particular need requires access to a comprehensive program development environment. The UNIX system provides an exceptional programming environment. Because the operating system was written in C by highly talented programmers who had their own needs in mind, UNIX provides an ideal environment for program creation using C/C++. Operating system services are readily accessible to the C/C++ programmer in the form of function libraries and system calls2. In addition, there are a variety of tools for making the development and maintenance of programs easier.

C/C++ and UNIX

This course teaches C/C++ under the UNIX operating system. C/C++ programs will look similar under any other system (such as VMS or DOS), some other features will differ from system to system. In particular the method of compiling a program to produce a file of runnable code will be different on each system.

As mentioned above, the UNIX operating system is written in C. In fact the C language was invented specifically to implement UNIX in a manner which made it machine independent. All of the UNIX commands which you type, plus the other system facilities such as password checking, lineprinter queues or magnetic tape controllers are written in C.

In the course of the development of UNIX, hundreds of functions were written to provide access to various facets of the system. These functions are available to the C/C++ programmer in libraries. By writing in C/C++ and using the UNIX system libraries, very powerful programs can be created. These libraries are very difficult to access using any other language, C/C++ is therefore the natural language for writing UNIX system programs.

What is a Compiler?

A compiler is a special program that receives statements written in a particular programming language and translates them into the machine language or "object code" that a computer's processor understands. This machine language object code is an executable program file. Typically, a programmer writes program statements in a high-level programming language such as C or C++ one line at a time using a text editor. These statements are then saved to a file. This file contains what are called source statements or collectively the "source code." The programmer then runs the appropriate programming language compiler, specifying the name of the text file that contains the source statements.

When executing (running), the compiler first parses (or analyzes) all of the input (source code) statements to assure that their syntax is correct. Then, in one or more successive passes, it creates the output (object code), making sure that statements which reference other statements are correct in the final version of the object code3. The object code version of the program contains the string of 0s and 1s (called machine language) that the processor understands.

A preprocessor is a program invoked by various compilers to process code before compilation. For example, the C preprocessor, cpp, handles textual macro substitution, conditional compilation and inclusion of other files. A preprocessor may be used to transform a program into a simpler language, e.g., to transform C++ into C. A compiler works with what are sometimes called third-generation)4, fourth-generation, and fifth-generation languages. An assembler)5 works on programs written using a processor's assembly language. A link editor (or linker) is a computer program which accepts the object code files of one or more separately compiled program modules)6, and links them together into a complete executable)7 program file, resolving references from one module to another.

Compiling a C Program

Once you understand the purpose and functioning of a compiler, the next step is to use a specific compiler with a program written is C language. The following three commands can be used to compile a C program. In this example, the cc utility is used for a C program named prog.c:

et791:~$ gcc prog.c
et791:~$ mv a.out prog
et791:~$ chmod 755 prog

The cc utility calls the C preprocessor, the C compiler, the assembler, and the link editor. The link editor creates an executable file named (by default) a.out. The second command renames a.out to prog. If you fail to rename the a.out file, the next use of the cc utility will overwrite the executable file. The last command is used to make the object code file (now named prog) executable so that you can run it and test it for logic and/or runtime errors.

The -o argument (or parameter) can be used to speed up this process. The following two commands achieve the same results as the three above:

et791:~$ gcc -o prog prog.c
et791:~$ chmod 755 prog

With this approach, there is no need to rename the a.out executable file.

GNU C and C++ Compiler

For a C++ program, we will be using the GNU8 C++ compiler which is called gcc or g++. Actually, the C and C++ compilers are integrated under the gcc and g++ utilities. Both utilities process C or C++ input source code files through one or more of the four stages mentioned above: preprocessing, compilation, assembly, and linking. Source code filename extensions indicate the specific language:

.c
C source code
.C
C++ source code
.cc
C++ source code
.cxx
C++ source code
.cpp
C++ source code
.c++
C++ source code

For example, to compile and run the C++ program grades.cpp we could use the g++ command as shown below:

et791:~$ gcc grades.cpp
et791:~$ mv a.out grades
et791:~$ chmod 755 grades
et791:~$ grades

In the example command sequence above, the a.out file created by the g++ compiler is an executable (binary) file. We rename it to grades so that we do not loose our executable file when the next use of the compiler overwrites the a.out file. As with the cc utility, a slightly more complex form of the g++ command is both faster and more useful:

et791:~$ g++  -Wall -g grades.cpp  -o  grades 

Or, you may use the following version of the command:

et791:~$ gcc  -Wall -g grades.cpp  -o  grades 
Both of these versions call the same compiler.

Note:

  1. The gcc -Wall –g file_name.cpp –o file_name command requires that the main function of your program should be of type int. Don’t forget to include return 0 at the end of your program while using int main.
  2. Note the uppercase W in the Wall.

Each of the arguments in this command are explained as follows:

-Wall
instructs the C++ compiler to list all warning messages. These warning messages usually indicate programming errors, and we will ask for all possible warnings, as clues to what we might have done wrong.
-g
tells the compiler to generate special code that will allow us to use a debugger. We will introduce the debugger at a later time.
-o grades
indicate that the compiler should put its output (the "executable code") in a file called grades. If you do not specify a file name for the executable code, then the compiler places its result in a file named a.out.
Note: When working with C++ programs, be certain to use the g++ compiler. This compiler assumes C++ code and libraries. The gcc compiler does not. Check out the man pages for both g++ and gcc to see the differences.

Running a Program

It is important to remember that the compiler creates a file which is executable, meaning that you can run it by simply typing its name at the UNIX prompt.

et791:~$ program_name 

If you type the name of an executable file and it does not run (i.e., you get an error message) try typing a "./" (dot slash) followed by the program_name, and then press the [Enter] key.

et791:~$ ./program_name 

One of these two methods should work for you. Once the program is finished, you will see the UNIX prompt.

Note: The "./" is needed to run the program because of a "path" issue.

Finding and Correcting Errors

The process of finding and correcting program errors is known as debugging. Debugging is the process of attempting to determine and correct the cause of the symptoms of program errors identified by compilation, testing or by frustrated users.

Syntax errors. It is quite possible (in fact almost certain) for the compiler to detect some types of problems or errors in your program the first time you compile it. The types of errors that the compiler can find are called syntax errors. These are errors that simply mean the program you have given the compiler is not completely correct C code. For example, in arithmetic expressions, you must have a right parenthesis " ) " for ever left one " ( ". If you do not, the compiler will detect this mistake and issue a message indicating that there is a syntax error present. It cannot correct errors, because it does not know where you meant the other one to go. The output you get will usually look something like:

foo.c:82: syntax error
foo.c:109: undeclared identifier num_scores

The compiler will try to produce useful and informative messages that tell you when you've failed to use the C language correctly. However, in reality, they are often somewhat baffling. Don't hesitate to ask what an error means; we have more practice interpreting these messages and can often translate them for you. Also don't get too discouraged if you type in a 100 line program, try to compile it and two or three screenfulls of error messages fly by. This is part and parcel of the process of programming. Tracking down these problems is often perversely pleasurable and it will hopefully give you a greater appreciation of what the people out there building VCRs and airline reservation systems are going through.

The syntax errors that the compiler can detect must all be corrected before it will produce a executable file. So continue to correct and recompile your program until the compiler runs without issuing any error messagess.

Logic errors. Here you may discover another type of error. We usually call these logic errors to distinguish them from the syntax errors that the compiler finds. These are generally errors that are mistakes in how you've designed your program or in how you've translated your design to the C programming language. The compiler cannot find these types of errors for you since it cannot read your mind and doesn't know what you want the program to do. As a result these errors are generally more difficult to track down and fix. The process of doing so is what we call debugging.

Runtime errors. One type of logic error that is often encountered is the infinite loop. An infinite loop is when a program continues running (forever) when you really wanted it to stop at some point. In order to deal with this error, you need some way of interrupting the program and stopping it from running. To kill a running program, type [Ctrl]+c to send it the interrupt signal. This approach will work for most of the programs that you will be developing in this course,

As the figure above illustrates, finding and correcting these errors is really the cycle a programmer usually follows. First you edit your program adding new features and/or fixing errors. Then you compile the program creating an executable machine language program. Finally, you run the machine language program and examine the output to see if it is working correctly. If you discover that it is not working correctly, you must repeat the process until it does.

Assignment

Complete the following before beginning the next lesson:


#include <iostream.h>

void main(void)
{
  cout << "Hello World!" << endl;
}

  1. telnet (or rlogin) to the class server
  2. use one of the UNIX editors to enter the simple C++ program above, save it as "hello.cpp"
  3. use g++ to compile the hello.cpp program
  4. run the program from the command prompt
  5. send me an email message noting your experiences in performing this task


Footnotes:

1.
A program is a specific set of ordered operations for a computer to perform. A program contains a one-at-a-time sequence of instructions that the computer follows. Typically, the program is put into a storage area accessible to the computer (memory). The computer gets one instruction and executes (performs) it and then gets the next instruction and executes it ... repeating the process until the program ends.
2.
A system call is the mechanism used by an application program to request service from the operating system. System calls often use a special machine code instruction which causes the processor to change mode (e.g. to "supervisor mode" or "protected mode"). This allows the OS to perform restricted actions such as accessing hardware devices or the memory management unit.
3.
The term object code as used here is not related to the concept of object-oriented programming. The object code is machine code that the processor can process or "execute" one instruction at a time.
4.
A third-generation language is a "high-level" programming language, such as PL/I, C, or Java. Fourth-generation language is designed to be closer to natural language than a 3GL language. Languages for accessing databases are often described as fourth-generation languages. Fifth-generation language is programming that uses a visual or graphical development interface to create source language that is usually compiled with a third-generation or fourth-generation language compiler. Microsoft, Borland, IBM, and other companies make fifth-generation language visual programming products for developing applications in Java, for example. Visual programming allows you to easily envision object-oriented class hierarchies and drag icons to assemble program components.
5.
An assembler is a program that takes basic computer instructions and converts them into a pattern of bits that the computer's processor can use to perform its basic operations.
6.
A module is an independent piece of software which forms part of one or more larger programs.
7.
An executable file is a binary file containing a program in machine language which is ready to be executed (run) by typing the name of the file at the UNIX prompt. Filenames for executable files are not restricted to any specific pattern and may or may not have an extension. In UNIX an executable file must, as a minimum, have the execute permission bit set for the owner.
8.
GNU (GNU's Not UNIX) is a UNIX-like operating system that can be freely copied, modified, and redistributed.

 

There have been visitors since 8/31/07

Added to the Web: August 31, 2007.

Web page design by Dan Solarek.

http://cset.sp.utoledo.edu/cset3150/