|
UNIX and Program Development
The process of developing a
program1
to satisify a particular need requires access to a comprehensive
program development environment. The UNIX system provides an exceptional
programming environment. Because the operating system was written in
C by highly talented programmers who had their own needs in mind,
UNIX provides an ideal environment for program creation using C/C++.
Operating system services are readily accessible
to the C/C++ programmer in the form of function libraries and
system calls2.
In addition, there are a variety of tools for making the development
and maintenance of programs easier.
C/C++ and UNIX
This course teaches C/C++ under the UNIX operating system.
C/C++ programs will look similar under any other
system (such as VMS or DOS), some other features will
differ from system to system. In particular the
method of compiling a program to produce a file of
runnable code will be different on each system.
As mentioned above, the UNIX operating system is written in C.
In fact the C language was invented specifically to implement
UNIX in a manner which made it machine independent. All of the
UNIX commands which you type, plus the other system facilities
such as password checking, lineprinter queues or magnetic tape
controllers are written in C.
In the course of the development of UNIX, hundreds
of functions were written to provide access to various
facets of the system. These functions are available
to the C/C++ programmer in libraries. By writing in C/C++ and
using the UNIX system libraries, very powerful
programs can be created. These libraries are
very difficult to access using any other language,
C/C++ is therefore the natural language for writing UNIX
system programs.
What is a Compiler?
A compiler is a special program that receives statements written in
a particular programming language and translates them into the machine
language or "object code" that a computer's processor understands.
This machine language object code is an executable program file.
Typically, a programmer writes program statements in a high-level
programming language such as C or C++ one line at a time using a text
editor. These statements are then saved to a file. This file contains
what are called source statements or collectively the "source code."
The programmer then runs the appropriate programming language compiler,
specifying the name of the text file that contains the source statements.
When executing (running), the compiler first parses (or analyzes) all
of the input (source code) statements to assure that their syntax is
correct. Then, in one or more successive passes, it creates the output
(object code), making sure that statements
which reference other statements are correct in the final version of the
object code3.
The object code version of the program contains the string of 0s and 1s
(called machine language) that the processor understands.
A preprocessor is a program invoked by various compilers to process code
before compilation. For example, the C preprocessor, cpp, handles textual
macro substitution, conditional compilation and inclusion of other files.
A preprocessor may be used to transform a program into a simpler language,
e.g., to transform C++ into C.
A compiler works
with what are sometimes called
third-generation)4,
fourth-generation, and fifth-generation languages.
An assembler)5
works on programs written using a processor's assembly language.
A link editor (or linker) is a computer program which accepts the object
code files of one or more separately compiled program
modules)6, and links
them together into a complete
executable)7 program
file, resolving references from one module to another.
Compiling a C Program
Once you understand the purpose and functioning of a compiler, the next step is
to use a specific compiler with a program written is C language. The following
three commands can be used to compile a C program. In this example, the
cc utility is used
for a C program named prog.c:
et791:~$ gcc prog.c
et791:~$ mv a.out prog
et791:~$ chmod 755 prog
The cc utility
calls the C preprocessor, the C compiler, the assembler, and the link
editor. The link editor creates an executable file named (by default)
a.out.
The second command renames a.out
to prog.
If you fail to rename the a.out
file, the next use of the cc
utility will overwrite the executable file.
The last command is used to make the object code file (now named
prog) executable so
that you can run it and test it for logic and/or runtime errors.
The -o argument (or parameter)
can be used to speed up this process. The following two commands achieve
the same results as the three above:
et791:~$ gcc -o prog prog.c
et791:~$ chmod 755 prog
With this approach, there is no need to rename the
a.out
executable file.
GNU C and C++ Compiler
For a C++ program, we will be using the
GNU8
C++ compiler which is called gcc or g++. Actually, the C and C++
compilers are integrated under the gcc and g++ utilities.
Both utilities process C or C++ input source code files through
one or more of the four stages mentioned above: preprocessing,
compilation, assembly, and linking. Source code
filename extensions indicate the specific language:
- .c
- C source code
- .C
- C++ source code
- .cc
- C++ source code
- .cxx
- C++ source code
- .cpp
- C++ source code
- .c++
- C++ source code
For example, to compile and run the C++ program grades.cpp we could use
the g++ command as shown below:
et791:~$ gcc grades.cpp
et791:~$ mv a.out grades
et791:~$ chmod 755 grades
et791:~$ grades
In the example command sequence above, the a.out file created
by the g++ compiler is an executable (binary) file. We rename it to grades
so that we do not loose our executable file when the next use of the compiler
overwrites the a.out file.
As with the cc
utility, a slightly more complex form of the
g++
command is both faster and more useful:
et791:~$ g++ -Wall -g grades.cpp -o grades
Or, you may use the following version of the command:
et791:~$ gcc -Wall -g grades.cpp -o grades
Both of these versions call the same compiler.
Note:
- The gcc -Wall –g file_name.cpp –o file_name command requires that the
main function of your program should be of type int. Don’t forget to include
return 0 at the end of your program while using int main.
- Note the uppercase W in the Wall.
Each of the arguments in this command are explained as follows:
- -Wall
- instructs the C++ compiler to list all warning messages. These warning
messages usually indicate programming errors, and we will ask for
all possible warnings, as clues to what we might have done wrong.
- -g
- tells the compiler to generate special code that will
allow us to use a debugger. We will introduce the debugger at a later
time.
- -o grades
- indicate that the compiler should put its output (the "executable code")
in a file called grades. If you do not specify a file name for the
executable code, then the compiler places its result in a file named
a.out.
Note: When working with C++ programs, be certain to use the g++ compiler.
This compiler assumes C++ code and libraries. The gcc compiler does not. Check out the
man pages for both g++ and gcc to see the differences.
Running a Program
It is important to remember that the compiler creates a file which is executable,
meaning that you can run it by simply typing its name at the UNIX prompt.
et791:~$ program_name
If you type the name of an executable file and it does not run (i.e., you get an error
message) try typing a "./" (dot slash) followed by the program_name, and
then press the [Enter] key.
et791:~$ ./program_name
One of these two methods should work for you. Once the program is finished,
you will see the UNIX prompt.
Note: The "./" is needed to run the program because of a "path" issue.
Finding and Correcting Errors
The process of finding and correcting program errors is known as
debugging. Debugging is the process of attempting to determine and correct the
cause of the symptoms of program errors identified by compilation, testing or
by frustrated users.
Syntax errors. It is quite possible (in fact almost certain) for the
compiler to detect some types of problems or errors in your program the first
time you compile it. The types of errors that the compiler can find are called
syntax errors. These are errors that simply mean the program you have given
the compiler is not completely correct C code. For example, in arithmetic
expressions, you must have a right parenthesis " ) " for ever left one " ( ".
If you do not, the compiler will detect this mistake and issue a message indicating
that there is a syntax error present. It cannot correct errors, because it does
not know where you meant the other one to go. The output you get will usually
look something like:
foo.c:82: syntax error
foo.c:109: undeclared identifier num_scores
The compiler will try to produce useful and informative
messages that tell you when you've failed to use the C language
correctly. However, in reality, they are often somewhat
baffling. Don't hesitate to ask what an error means; we have
more practice interpreting these messages and can often
translate them for you. Also don't get too discouraged if you
type in a 100 line program, try to compile it and two or
three screenfulls of error messages fly by. This is part and
parcel of the process of programming. Tracking down these
problems is often perversely pleasurable and it will
hopefully give you a greater appreciation of what the
people out there building VCRs and airline reservation systems
are going through.
The syntax errors that the compiler can detect must all be corrected
before it will produce a executable file. So continue to correct and
recompile your program until the compiler runs without issuing any
error messagess.
Logic errors. Here you may discover another type of error. We usually
call these logic errors to distinguish them from the syntax
errors that the compiler finds. These are generally errors
that are mistakes in how you've designed your program or in
how you've translated your design to the C programming
language. The compiler cannot find these types of errors for
you since it cannot read your mind and doesn't know what
you want the program to do. As a result these errors are
generally more difficult to track down and fix. The
process of doing so is what we call debugging.
Runtime errors. One type of logic error that is often encountered is
the infinite loop. An infinite loop is when a program continues running
(forever) when you really wanted it to stop at some point. In order to
deal with this error, you need some way of interrupting the program and
stopping it from running. To kill a running program, type [Ctrl]+c to send
it the interrupt signal. This approach will work for most of the programs
that you will be developing in this course,
As the figure above illustrates, finding and correcting these errors
is really the cycle a programmer usually follows. First you edit your
program adding new features and/or fixing errors. Then you compile
the program creating an executable machine language program. Finally,
you run the machine language program and examine the output to see if
it is working correctly. If you discover that it is not working
correctly, you must repeat the process until it does.
Assignment
Complete the following before beginning the next lesson:
#include <iostream.h>
void main(void)
{
cout << "Hello World!" << endl;
}
- telnet (or rlogin) to the class server
- use one of the UNIX editors to enter the simple C++ program above, save it as "hello.cpp"
- use g++ to compile the hello.cpp program
- run the program from the command prompt
- send me an email message noting your experiences in performing this task
Footnotes:
- 1.
- A program is a specific set of ordered operations for a computer
to perform. A program contains a one-at-a-time sequence of instructions
that the computer follows. Typically, the program is put into a storage
area accessible to the computer (memory). The computer gets one instruction
and executes (performs) it and then gets the next instruction and executes
it ... repeating the process until the program ends.
- 2.
- A system call is the mechanism used by an application program to request
service from the operating system. System calls often use a special machine code
instruction which causes the processor to change mode (e.g. to "supervisor mode"
or "protected mode"). This allows the OS to perform restricted actions such as
accessing hardware devices or the memory management unit.
- 3.
- The term object code as used here is not related to the concept
of object-oriented programming. The object code is machine code that the
processor can process or "execute" one instruction at a time.
- 4.
- A third-generation language is a "high-level" programming language,
such as PL/I, C, or Java. Fourth-generation language is designed to be closer
to natural language than a 3GL language. Languages for accessing databases
are often described as fourth-generation languages. Fifth-generation language
is programming that uses a visual or graphical development interface to create
source language that is usually compiled with a third-generation or
fourth-generation language compiler. Microsoft, Borland, IBM, and other companies
make fifth-generation language visual programming products for developing
applications in Java, for example. Visual programming allows you to easily
envision object-oriented class hierarchies and drag icons to assemble
program components.
- 5.
- An assembler is a program that takes basic computer instructions and converts
them into a pattern of bits that the computer's processor can use to perform its
basic operations.
- 6.
- A module is an independent piece of software which forms part of one
or more larger programs.
- 7.
- An executable file is a binary file containing a program in machine
language which is ready to be executed (run) by typing the name of the file at
the UNIX prompt. Filenames for executable files are not restricted to any specific
pattern and may or may not have an extension. In UNIX an executable file must, as
a minimum, have the execute permission bit set for the owner.
- 8.
- GNU (GNU's Not UNIX) is a UNIX-like operating system that can be
freely copied, modified, and redistributed.
|