Please send all questions and assignments to
your instructor:
daniel.solarek@utoledo.edu

 

Introduction to Algorithms

Learning Objectives

The purpose of this lesson is to introduce you to the study of algorithms. At the end of this lesson, you should be able to:
  • understand what an algorithm is
  • start thinking of ways to create algorithms
  • understand the process of developing algorithms through the study of union and find algorithms for the Connectivity Problem
  • Analyze the worst case performance of algorithms

Algorithms

An algorithm is a method that is used to describe and solve a problem. The method should be suitable for implementation as a computer program. The study of computer science is the study of algorithms.

Many algorithms deal with the organizing and rearranging of data involved into a more useful form. The data-objects created thus are called ‘data structures’. Thus algorithms operate on, and create data-structures. Whatever be the problem, it is always desirable to find an algorithm that solves the problem in the smallest time, and consumes the least memory space.

Why do we study algorithms? The industry always looks for more and more efficient ways to do business. There is a cost associated with the running of algorithms too, and this cost becomes apparent when we are processing thousands or millions of objects. Thus, an understanding of algorithm design gives us the potential to reap huge savings.

Look at the following table which shows the relative timing for two algorithms which search for a number within a given list of numbers. Assume that 1 time unit costs 1 cent. The difference in costs between the two algorithms is apparent.

Number of objects in the list Number of searches = 10000 Number of searches = 100000
Sequential search Binary Search Sequential search Binary search
125 13 2 130 20
250 25 2 251 22
500 49 3 492 23
1250 128 3 1276 25
2500 267 3 2601 28
5000 533 3 5233 30
12500 1337 3 11222 33
25000 2601 3 22445 35
50000 5331 4 45212 39
100000 10000 5 92345 47

Thus, a reason to study algorithms is to learn the skill to be able to devise methods that use time and space as efficiently as possible. Often, algorithms are available in algorithm libraries. An understanding of the theory behind algorithms is required to gain an insight into these algorithms to be able to use them effectively. The requirement to re-implement simple algorithms also arises frequently when new implementations are required in new computing environments (where the software or hardware has changed, and hence we must re-implement the algorithm). Sometimes, it may simply not be convenient to use an algorithm implementation from a library, and you will need to tailor it to your particular task and environment.

A sample problem: Connectivity

Problem description: Given an input sequence of pairs of integers, where the pair p-q means that the integer p is "connected to" the integer q.

Desired output: To filter the input pairs. An input pair p-q will be accepted only if the pairs previously seen in the input do not imply that p is connected to q (as seen till this input). Thus an input pair p-q will be accepted only if it represents a new connected not already implied by the previous input.

Areas of application: This problem is called the connectivity problem, and arises in a number of areas. The integers p-q may be representations of computers in a network, and we need to find out if two computers are connected and can communicate with each other. The integers might represent contact points in an electrical network, and the pairs represent that two points are connected by a wire.

Specify the problem description in clear and unambiguous terms.

Once a problem has been specified, we can add requirements and modifications to the algorithm. For example, in our connectivity problem we can require our program to also find out all ways by which a pair p-q is connected. This can make our problem more difficult. So it is best always to begin from a simpler problem and work towards the more complex.

Some ideas to solve the problem:

Identify the fundamental operation in the problem – each time we add a pair, first determine if it represents a new connection, then to incorporate that information about a (possibly) new connection into a ‘database’.

Find a representation or data-structure to model a connected group of integers – each connected group of integers can represented by its own ‘set’. This is the data-structure which defines the connectivity problem.

Identify the operations to be considered on the data-structure which models the problem –

Given a new integer pair, find out which set each of the integers belong to. If both integers in the pair do not belong to any existing set, we create a new set which contains just this pair of integers. If one integer from the new pair already belongs to a set, add the other integer in the pair to that set. If both integers in the pair belong to two different already existing sets, replace the two sets by the union of the two sets(since all the integers in the set are now known to be connected, they will be part of just one set).

Develop more powerful layers of abstraction over the fundamental operations already described.

The applet given below represents a version of the connectivity problem. The marked objects represent pairs of points in a maze, and the objective is to find out if the pair of points have a simple path between each other in the maze. A path between two points is always guaranteed in this maze. The problem is to connect the two points.

Click on ‘Show Gen’ to generate and solve the maze.

Implementing Union Find Algorithms for the Connectivity Problem

We will look at implementations of the union and find operations to solve the connectivity problem. We will begin with an implementation and try to modify it to arrive at a better implementation.

Method 1 – Quick-find solution to the connectivity problem

The basis of this algorithm is an array of integers with the property that p and q are connected if and only if the pth and qth array entries are equal. The ith array entry is initialized to i for i in the range 0 .. N. This implies that before any connected pairs are observed (which force union between sets), each entry of the set is in its own disconnected set. Once a pair is found which includes that array entry with another array entry, the two entries will form a larger set by the union of their two sets. To implement the union operation for entries p and q, we go through the array and change all the entries with the same name as p to have the same name as q (i.e. we form a new set in which all entries are the same – equal to the entry of element q).

Code for Quick-find –

The C version:

#include <stdio.h>
#define N 10000
int main()
{
  int i, p, q, t, id[N];
  for (i = 0; i < N; i++) id[i] = i;
  while (scanf("%d %d\n", &p, &q) == 2)
  { 
   if (id[p] == id[q]) continue;
   for (t = id[p], i = 0; i < N; i++)
     if (id[i] == t) id[i] = id[q];
   printf(" %d %d\n", p, q);
  }
  return (0);
}

The C++ version:

#include <iostream.h>
static const int N = 10000;
int main()
{ 
  int i, p, q, id[N];
  for (i = 0; i < N; i++) id[i] = i;
  while (cin >> p >> q)
  { 
    int t = id[p];
    if (t == id[q]) continue;
    for (i = 0; i < N; i++)
      if (id[i] == t) id[i] = id[q];
    cout << " " << p << " " << q << endl;
  }
  return (0);
}

Analysis of Quick Find:

Find operation – With this algorithm, the find operation take unit time which is the time taken to compare two entries in the set.

Union operation – The union operation for each pair takes N time units, since we have to scan the entire array of size N. Hence, to solve a connectivity problem on M pairs takes MN time units (N time units for each pair).

Method 2 – Quick-union solution to the connectivity problem

The quick union algorithm is designed to implement union faster than the quick-find algorithm. The algorithm begins with the same data structure – an array of integers with the ith array entry is initialized to i for i in the range 0 .. N. If the ith array entry is j, this means that this entry is a ‘pointer’ that points to the jth array entry, both of which are in the same set. To determine if two objects are in the same set, we follow pointers for each until we reach an object that points to itself. The objects are in the same set only if this process leads them to the same object. If they are not in the same set, they wind up at different objects. To form the union, we just link the root of one set to the root of the other.

Code for Quick-union –

The C version:

#include <stdio.h>
#define N 10000
int main()
{ 
  int i, p, q, t, id[N];
  for (i = 0; i < N; i++) id[i] = i;
  while (scanf("%d %d\n", &p, &q) == 2)
  { 
    for (i = p; i != id[i]; i = id[i]) ;
    for (j = q; j != id[j]; j = id[j]) ;
    if (i == j) continue;
    id[i] = j;
    printf(" %d %d\n", p, q);
  }
  return (0);
}

The C++ version:

#include <iostream.h>
static const int N = 10000;
int main()
{ 
  int i, p, q, id[N];
  for (i = 0; i < N; i++) id[i] = i;
  while (cin >> p >> q)
  {
    for (i = p; i != id[i]; i = id[i]) ;
    for (j = q; j != id[j]; j = id[j]) ;
    if (i == j) continue;
    id[i] = j;
    cout << " " << p << " " << q << endl;
  }
  return (0);
}

Analysis of Quick Union:

Each union operation takes a unit time as we just link the root of one set to the root of the other set.

Each find operation could take on an average (N-1)/2 operations in the worst case. Suppose that the input pairs come in the order 1-2, then 2-3, 3-4 and so on, After N-1 such pairs, we form a tree with N pointing to N-1, which points to N-2 and so on. The find operation for object N has to follow N-1 pointers. The average number of pointers for the first N pairs is

 (0 + 1 + ……. + (N-2) + (N-1))/N = (N – 1)/2

Now suppose that the remainder of pairs all connect N to some other object (in the range (0 .. N-1). The find operation on each of these pairs involves at least (N-1) pointers. Thus the average number of pointers followed for each find operation is greater that N/2 (on the average). M find operations, on M pairs hence take MN/2 operations on the average.

The main deficiency of quick-union is the formation of lopsided trees in which one side of the tree is bigger than the other side. This deficiency is improved in the weighted quick-union algorithm which tries to form balanced trees.

Method 3 – Weighted Quick-union solution to the connectivity problem

The basic algorithm for weighted Quick-union is the same as that for quick union. The modification is in the union operation. In this algorithm, we keep track of the number of nodes in each tree and always connect the smaller tree to the larger. This ensures that the height of the tree does not increase by more than 1 at each union operation.

Code for weighted quick-union –

The C version:

#include <stdio.h>
#define N 10000
int main()
{ 
  int i, j, p, q, id[N], sz[N];
  for (i = 0; i < N; i++) 
  {
    id[i] = i; 
    sz[N] = 1;
  }
  while (scanf("%d %d\n", &p, &q) == 2)
  { 
    for (i = p; i != id[i]; i = id[i]);
    for (j = q; j != id[j]; j = id[j]);
    if (i == j) continue;
    if (sz[i] < sz[j])
    { 
      id[i] = j; sz[j] += sz[i]; 
    }
    else 
    { 
      id[j] = i; sz[i] += sz[j]; 
    }
    printf(" %d %d\n", p, q);
  }
  return (0);
}

The C++ version:

#include <iostream.h>
static const int N = 10000;
int main()
{ 
  int i, j, p, q, id[N], sz[N];
  for (i = 0; i < N; i++) 
  { 
    id[i] = i; 
    sz[i] = 1; 
  }
  while (cin >> p >> q)
  { 
    for (i = p; i != id[i]; i = id[i]);
    for (j = q; j != id[j]; j = id[j]);
    if (i == j) continue;
    if (sz[i] < sz[j])
    { 
      id[i] = j; sz[j] += sz[i]; 
    }
    else 
    { 
      id[j] = i; sz[i] += sz[j]; 
    }
    cout << " " << p << " " << q << endl;
  }
  return (0);
}

Analysis of weighted quick-sort:

We will learn later that the number of pointers needed to follow to get to the root in a tree of 2n nodes in weighted quick-union is at most n. Or, put another way the number of pointers followed from any node to the root in a set of k objects is not greater than lg2k.

When we merge two trees of 2n nodes each, we get a tree of size 2n+1 nodes, and we increase the maximum distance to the root to n+1.We can prove that even after the union operation, the tree preserves the property that the number of pointers followed from any node to the root in a set of k objects is not greater than lg2k. Say we combine a set of I nodes with a set of j nodes with i<j: the operation increases the number of pointers to be followed by elements in the smaller set to the root by 1, but now we are in a set of size i+j. So, the number of pointers followed to the root is at most 1+ lg2i = lg(2i) = lg(i+i) < lg(i+j). Hence, in the new set of size i+j, the number of pointers followed to the root is at most lg(i+j). Since the property was true for the original tree, it will be true for each tree formed by the weighted quick-union algorithm.

The weighed quick union takes at most a constant time M lg N instructions to solve the connectivity problem to process M pairs on N objects(since it takes at most lgN time units to solve the operation on each pair). The weighted quick union is an improvement over the quick-union algorithm which took at least MN/2 operations in the worst case. Now, we ask the natural question – is there an algorithm that takes guaranteed linear time to solve the connectivity problem? This is a difficult question, and a lot of research has been done on it. We will talk of one easy way to improve the weighted quick union algorithm.

Method 4 – Path compression with weighted quick-union

Ideally, we want every node in the tree to point to the root. In that way not only union but the find operation will also take a constant unit time. We can implement this using path compression, by adding another pass to the union algorithm, setting the entry corresponding to each vertex encountered in the union loop to point towards the root. The net result is to flatten the trees almost completely.

Another way to implement partial path compression is called path compression by halving. This method compresses paths by making each link skip up to the next node on the way up the tree. The code for this is given below:

The C version:

#include <stdio.h>
#define N 10000

int main()
{ 
  int i, j, p, q, id[N], sz[N];
  for (i = 0; i < N; i++) 
  {
    id[i] = i; sz[N] = 1
  }
  while (scanf("%d %d\n", &p, &q) == 2)
  { 
    for (i = p; i != id[i]; i = id[i]) 
    {
      int t = i; 
      i = id[id[t]]; id[t] = i; 
    } 
    for (j = q; j != id[j]; j = id[j]) ;
    {
      int t = j; j = id[id[t]]; id[t] = j; 
    } 
    if (i == j) continue;
    if (sz[i] < sz[j]) 
    { 
      id[i] = j; sz[j] += sz[i]; 
    }
    else 
    { 
      id[j] = i; sz[i] += sz[j]; 
    }
    printf(" %d %d\n", p, q);
  }
  return (0);
}

The C++ version:

#include 
static const int N = 10000;

int main()
{ 
  int i, j, p, q, id[N], sz[N];
  for (i = 0; i < N; i++) 
  { 
    id[i] = i; sz[i] = 1; 
  }
  while (cin >> p >> q)
  { 
    for (i = p; i != id[i]; i = id[i]) 
    {
      id[i] = id[id[i]]; 
    }
    for (j = q; j != id[j]; j = id[j]) 
    {
      id[j] = id[id[j]]; 
    }
    if (i == j) continue;
    if (sz[i] < sz[j])
    { 
      id[i] = j; sz[j] += sz[i]; 
    }
    else 
    { 
      id[j] = i; sz[i] += sz[j]; 
    }
    cout << " " << p << " " << q << endl;
  }
}

An approach to the study of algorithms

We have presented methods that successively improve the performance of the connectivity problem. We found algorithms that gave us a guaranteed running time. This is the way that most work in algorithms is done – by successive refinement of previous algorithms. Thus, the connectivity problem gives us perspective on the study of algorithms that will be used throughout this course. The method we followed is the same that will be followed throughout:

  • Decide on a complete and unambiguous problem statement
  • Carefully develop a data structure to model and implement the problem
  • Develop improved implementations through a process of step-wise refinement, validating the efficiency of algorithms through empirical and mathematical analysis.
  • Strive for worst case performance guarantees wherever possible

Click on the button at left to return to the calling page.

 

There have been visitors since 9/29/07.

Added to the Web: September 29, 2007.

Web page design by Dan Solarek.

http://cset.sp.utoledo.edu/cset3150/