Programming Tips!

DISCLAIMER!: I was asked to write this page a good while ago and it is possible that some details are out of date. Most of this page should still be useful.

This page describes some of the methods and tools available to avoid and fix bugs, and introduces a few tricks to speed up code. Many of the most useful ideas are surprisingly simple, and the page is aimed at relative beginners to programming.


Avoiding Bugs

Bugs can be terribly difficult to track down, and minimising them should be a priority! With a few simple principles it's possible to reduce the number of bugs that occur in a code, and with a bit of luck maybe even get it running correctly first time.

Planning

It's tempting to dive straight into the programming, but all the rest of the program will probably rely on those first few lines, and the code may become rather cumbersome. A few moments scribbling on paper can save a lot of time in the long run. Some of the most important things to consider might be these:

  1. Firstly, do I understand the problem to be solved!? It'll be pretty hard to solve it otherwise, and the computer code will need to know all the details.
  2. All elements of the program rely heavily on the data structure. Can the data be broken down into logical groups (derived types, modules...)?
  3. How does the problem break down into smaller simpler problems (subroutines), and can they be arranged in logical groups (modules, files)?
  4. As the data is passed between the routines set out in item 3, it may be worth reconsidering item 2 to see if there's a more suitable data structure.
I really cannot emphisise the importance of DATA STRUCTURE enough! A well chosen data arrangement eases programming enormously, reducing the chance of error. A poorly chosen (or no) data structure leads to excessive argument lists to functions, and/or longer functions, lack of clarity, and is often unworkable by anyone other than the author. The following section on Programming introduces some of the basic ways to add structure to your data.


Programming

Once the aim is clearly set out in the planning stage the programming should seem relatively straight forward. The following describes a few of the features of modern programming languages designed to make the code more readable, to ease data passing, and to help avoid bugs.

Subroutines

Whilst the problem was broken down into subproblems in the planning, it's worthwhile breaking down each of the subproblems further. If the program is sufficiently divided into subroutines, then each...

Documentation (comments) should explain what a routine does, but the code itself within a short routine is often enough to explain how it works. Using indentation and leaving blank lines helps as it's a form of commenting.

The typical number of errors in a subroutine is proportional to it's length, with a sharp increase once it is larger than a page. Aim for routines roughly smaller than one screen size so that the whole routine can be seen at once.

In C, all variables must be declared before they can be used. In Fortran, if the variable silly has not been declared, then the statement silly = x creates a real variable and assigns the value of x to it. This is very dangerous -- if the variable name is misspelled, e.g. si11y = 2, then a new variable is created with the incorrect name and there is no compile time error message. Implicit typing can be stopped by compiling with the flag '-u', or better still, by placing the statement 'implicit none' at the beginning of the program, and at the beginning of each subroutine or module.

Fortran 90 also allows a specification of the intentions for each variable with the 'intent' statement, to help avoid accessing them incorrectly. For example:

   subroutine add(a,b,c)
      implicit none
      integer, intent(in)  :: a,b
      integer, intent(out) :: c
      c = a + b
   end subroutine add   

If a variable has the property intent(in), then the variable cannot be changed within the subroutine. A variable with the property intent(out) must be assigned a value before a value can be read from it later in the subroutine. The property intent(inout) is slightly redundant, but does indicate that the variable is an argument to the subroutine or function and not just a local variable.

Derived types

As it is possible to get by with just integers, floating point numbers and arrays, derived data types are in some sense non-essential (but very useful) and someone new to programming is unlikely to use a derived type without being told about them first!

A derived type is a logical grouping of data types which allows the data to be passed easily between subroutines as a whole, rather than passing each element of the group individually. Passing the data together reduces the number of arguments to subroutines, making it more readable, and reduces the risk of making a mistake.

The following example defines an odd shaped matrix with an associated tag:

   Fortran 90 
	definition:
			type tmatrix
			   integer :: tag
			   real    :: m(-8:8, 0:4)
			end type tmatrix
	declaration:
			type (tmatrix) :: tmat
	usage:
			tmat%tag = 5
			tmat%m(1,2) = 3.0
   C
	definition:
			typedef struct{
			   int tag;
			   float m[8][6];
			} tmatrix;
	declaration:
			tmatrix tmat;
	usage:
			tmat.tag = 5;
			tmat.m[1][2] = 3.0;
   

The variable tmat is declared as a tmatrix type, and its contents are accessed with a '%' in Fortran or a '.' (dot) in C. The variable tmat can be passed to subroutines as a single argument and saves a re-declaration of the tag and the matrix shape:

   Fortran 90
	call:
			call usetmatrix(tmat)
	prototype:
			subroutine usetmatrix(tm)
			type (tmatrix) :: tm
   C
	call:
			usetmatrix(tmat);
	prototype:
			void usetmatrix(tmatrix tm);
   

In C, structs are usually defined within a header file so that the definition can be #-included by other parts of the code. In Fortran, types are normally defined within a module which is then used by subroutines that require the definition.

Modularity

C is a very versatile language, and subroutines can easily be grouped into separate files with required type definitions provided in header files. The C++ class and the Fortran module go a step further, grouping variables, types and routines together as part of the language in a way that make data passing easier.

The simplest use of a Fortran module is as a pseudo common block. This is useful when only one instance of a data set is required (use 'types' for multiple instances). For example:

   module mydata
      implicit none
      save
      integer, parameter :: N = 16
      real :: x(N)
      real :: density(N,N)
      real :: temperature(N,N)
   end module mydata


   subroutine getpoints()
      use mydata
      implicit none
      integer :: i

      do i = 1, N
         x(i) = 0.01 * i
      end do

   end subroutine getpoints
   

The keyword save indicates that the contents of the mydata module should be saved, remaining static in memory. The mydata module is then included like a common block in the subroutine getpoints with the use statement. As the getpoints subroutine is closely related to the data, it could be moved into the module after a contains statement. Also, as it is likely there will be many variables of dimension(N,N) it could be defined as a type. The example above could be re-written:

   module mydata
      implicit none
      save
      integer, parameter :: N = 16

      type grid
         real :: pt(N,N)
      end type grid

      real :: x(N)
      type (grid) :: density
      type (grid) :: temperature

   contains

      subroutine getpoints()
         integer :: i
         do i = 1, N
            x(i) = 0.01 * i
         end do
      end subroutine getpoints

   end module mydata
   

The use statement to access variables x, density and temperature, and a repetition of the implicit none statement are not necessary within the getpoints subroutine, as their scope is over the whole module.



Fixing Bugs

Even taking all the precautions described above it's unlikely that a large code will work first time. This section looks at the most typical errors and methods for finding them.

Run-time errors

Run-time error messages that are returned when the program stops before it's supposed to (crashes!) can be very vague. The f95 flag '-C' compiles code which performs checks at run-time to catch errors such as accessing outside an array or reading uninitialised variables. The program may run slower but returns the point at which the crash occurred, although this is not necessarily where the error lies. The most common run-time error messages and causes are:

Maximum and minimum values for floating point numbers can be found in Fortran with print*, huge(x) or tiny(x), and in C with the values defined in float.h by FLT_MAX, DBL_MIN, etc.

Warning!: Overflow with integers does not usually give an explicit runtime error. The values wrap around at the limit so that huge(i)+1 will result in approximately -huge(i), a negative number! The ranges for integers are found in limit.h, INT_MIN, LONG_MAX etc. Note that this limits the size of an array that can be indexed by the integer, in paricular with the C int.

Limits for the default numeric types are approximately:

   C		Fortran
   int		-		     +- 32787
   long		integer		+- 2147483647
   float	real		       1E+-37
   double	double precision      1E+-308
   

Debuggers

If the program is not crashing, but gives incorrect results then there are fewer clues to work with and the error can be difficult to trace. The program needs to be stopped manually at various points to check if the data values calculated so far are correct. A debugger enables the programmer to specify a set of break points and to inspect the data without modifying the source code. The source needs to be compiled with degugging information, however, which turns off optimisations with most compilers.

Debuggers can also be started with a core file dumped at the time of a crash. The core file contains the status of the program at the time of the crash. The coresize is limited and no cores are dumped if the limit is set to 0. To change the limit (where SIZE is the maximum in kilobytes):

   tcsh:   > limit coredumpsize SIZE
   bash:   $ ulimit -c SIZE

gdb - The GNU Debugger

The GNU debugger can be used to examine any C or Fortran codes that have been compiled with the '-g' flag.

Fortran compilers append an underscore to symbol name, so that i becomes i_ (the g77 compiler appends two underscores if the symbol name contains an underscore, a_1 becomes a_1__). All arrays begin at 0, so that the fortran array x(1:n) becomes x_[0..n-1]. For example, element x(4) from the array x(1:6), corresponds to x_[3] .

The following lists some of the most useful commands for the GNU debugger. Square brackets [...] indicate optional arguments.

	gdb [executable [corefile] [processid]]
		Start the debugger.  Starting with the corefile loads
                the data as it was at the time of the coredump.  The  
		debugger can be started on a currently running job by
		specifying the process id.
		Examples:  gdb a.out,  gdb a.out core,  gdb a.out 23891
 
	list [[sourcefile:]function|linenumber[,lastlinenum]]
		List lines of source code after a function/subroutine  
		declaration or around a particular linenumber.
		Examples:  list 7,   list 8,12 ,   list myfunction,
                	   list source.f:15

	break [[sourcefile:]function|linenumber]
		Set a break point.  The program stops just before
		executing the function or line specified.

        clear [[sourcefile:]function|linenumber]
		Delete a break point.

	watch variable
		Set a (hardware) watch point.  Stop when the value 
		(stored at the address) of the variable changes.
		Examples:  watch i,  watch x[2]

	run [arglist]
		Start the program, with the argument list if specified.

	cont
		Continue running after having stopped.

	next
		Execute next line of the current function.

	step
		Execute next line, stepping into any functions
		referenced in the line.

	print [expression]
		Display the value of an expression.  
		(If Fortran, append underscores and shift arrays.)
		Examples:  print x,   print y[5],
			 ( print x_,  print y_[6-1] )
	
	set expression
		Assign a value to a symbol.
		Example:  set z=5

        where
		Display the stack.  Lists the current routine (first)
		and where is was called from.

	info [breakpoints] [local] ...
		info - on its own gives a list of subcommands.
    		info breakpoints - lists all breakpoints
    		info local - lists local variables in current frame

	help [command]
		Show information about a command.

	quit
		Exit.

Sample output: gdb, Watchpoints

In the following example a subroutine is called with the intention of updating a variable, a := sqrt(2)*a + b. The expected result is sqrt(2)*3+5=9.24, but the value 7.0 is returned.

(gdb) list 1,21
1       program main
2          implicit none
3          real :: a,b
4
5          b = 5.0
6          a = 3.0
7          call sub(a,b, a)
8          print*, a
9
10      end program main
11
12
13      subroutine sub(a,b, c)
14         implicit none
15         real, intent(in)  :: a,b
16         real, intent(out) :: c
17
18         c = sqrt(2.0)
19         c = c*a + b
20
21      end subroutine sub

A watchpoint is set on the address of variable a in program main. The program stops after line 6, and the previously uninitialised variable is set a=3. Continuing, the program is stopped after line 18, which sets the variable c in the subroutine. The watchpoint reveals that since the input and output variables a and c share the same memory location, using c as a temporary variable corrupts the value of a before it is read in line 19.

(gdb) break 5
Breakpoint 1 at 0x80485b9: file buggy.f90, line 5.
(gdb) run
Starting program: ./a.out 

Breakpoint 1, main (argc=1, argv=0xbffffbd4) at buggy.f90:5
5          b = 5.0
(gdb) watch a_
Hardware watchpoint 2: a_
(gdb) cont
Continuing.
Hardware watchpoint 2: a_

Old value = -1.99987078
New value = 3
main (argc=1, argv=0xbffffbd4) at buggy.f90:6
6          b = 5.0
(gdb) list 5,5
5          a = 3.0
(gdb) cont
Continuing.
Hardware watchpoint 2: a_

Old value = 3
New value = 1.41421354
sub_ (a_=0xbffffbac, b_=0xbffffba8, c_=0xbffffbac) at buggy.f90:18
19         c = c*a + b
(gdb) list 18,18
18         c = sqrt(2.0)


Optimisation

The optimisation flag '-O4' turns on many optimisations, but compilers find it very difficult to determine whether the adjustments described below are possible, and often does not  make them. By knowing a little about the compiler it sometimes possible to get order of magnitude improvements in runtime, even with a serial code.  Profiling finds which subroutines of the code are time-consuming. Once identified, rewriting sections of code can help to significantly improve performance; be careful however, as one should not significantly obscure the clarity of the code!

Profiling

The profiling utilities give information on how much time is spent in each function, and the number of function calls. The functions in which the code spends most time can then be targeted for optimisation.

The source code to be profiled is compiled with the '-pg' flag (or '-p'). After running the code, information is written to the file gmon.out (or mon.out). The profile can then be generated using the gprof (or prof) command, e.g.

   > f90 -pg opt.f90
   > ./a.out
   > gprof a.out | less 

The output is diplayed in various tables, the most important information is located in the first and last columns of the first table:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 34.88     46.27    46.27    21999     2.10     2.10  advance_rt_
 11.72     85.84    39.57                             c06fru_
 10.41    121.34    35.50   397037     0.09     0.09  spectral2real_
 10.11    152.79    31.45   134015     0.23     0.23  collocate_
  7.12    174.95    22.16                             c06fuz_
  5.61    195.51    20.56    88479     0.23     0.23  collocate_r_der_
  4.50    212.63    17.12    65997     0.26     0.26  uncollocate_
  2.09    228.47    15.84   197991     0.08     0.08  real2spectral_
  1.11    238.14     9.67    21999     0.44     0.44  r_der_
   ... 

In the sample more time is spent in the function advance_rt_ than any other, about 35% of the run time, and it might be a good target for optimisation. The lines with missing data in some columns correspond to library functions, and cannot be optimised.

Changing operations

The binary operators are not all executed at the same speed. The differences between the operations depend on the compiler and the processor, but are significant. Generally the operators + and - are much faster to calculate than *, which is usually faster than /, which is faster than ^. In C, multiplying or dividing integers by powers of 2 is very quick with the << and >> operators, which shift bits to the left or right. This suggests that it may be worth making changes similar to:

   x*2  ->  x+x   
   x/2  ->  x*0.5 
   x^2  ->  x*x
   n*2^p  ->  n<<p 
   n/2^p  ->  n>>p
It may be possible reduce the number of operations in the first place. For example:
   a*x^3 + b*x^2 + c*x + d  ->  ((a*x + b)*x + c)*x + d
or:
   do i=1,n			c = a*b
      y(i) = a*b*x(i)	->	do i=1,n
   end do			   y(i) = c*x(i)
				end do

Accessing memory

The processor performs operations on memory in the processor cache very quickly, but fetching memory for the cache is quite slow. Accessing and reusing variables that are already in the processor caches reduces the amount of fetching that is required and can improve performance significantly. As data is fetched into the processor cache in strips, accessing data that is close in memory reuses the cache.

In Fortran, arrays are stored in column major order, so the array A(3,2) is arranged in memory: A11, A21, A31, A12, A22, A32. In other words, data in the first index is contiguous in memory, forming a column of data. The second index labels the columns, which together form a matrix. If present, then a third index labels the matrices, and so on. In C the opposite is true, arrays are stored in row-major order, so data in the last index is contiguous. The second-last index labels the rows of data. The arrangement of data affects the way in which the data should be accessed.

Loop order

Assuming the array is contiguous, the Fortran loop on the left belowm accesses memory that is separated by n*n units at each iteration. Changing the order of the loops, the array is accessed along the memory and the calculation is performed in a fraction of the time. In C, the loop order on the left is preferable.

   do i=1,n					do k=1,n
      do j=1,n					   do j=1,n
         do k=1,n				      do i=1,n
            a(i,j,k) = i+j+k         ->			 a(i,j,k) = i+j+k
         end do					      end do
      end do					   end do
   end do					end do

Loop Fisson

In loop on the left below, the array a is read into the cache and a value set. Then the array b is read, then a value is set. This process occurs at each iteration. By splitting the loop each array is read into the cache only once and the variables are accessed along the memory.

   do i=1,m				do i=1,m
      a(i) = i				   a(i) = i
      b(i) = i		->		end do
   end do				do i=1,m
					   b(i) = i
					end do

Loop Fusion

The array in the split loops below is read into the cache twice if it is too large to fit in all at once. This is not unlikely, as the processor cache is usually small, O(1/1000), of the system memory. Joining the loops means that the array only has to be loaded once.

   do i=1,m				do i=1,m
      a(i) = i				   a(i) = i
   end do			->	   a(i) = a(i) + i
   do i=1,m				end do
      a(i) = a(i) + i
   end do

Inline functions

Setting up a function call takes a little time, especially if the routine requires some memory to be allocated. This time is usually insignificant compared to the work performed within the subroutine. However, the time taken is much longer than a single floating point operation, and if the subroutine is very short then the function calls can take up most of the time. In C a function can be given the keyword inline which instructs the compiler to replace each call to the function with its actual contents, eliminating the function call itself but making the executable larger. The '-O4' flag for both C and Fortran compilers attempt to inline functions automatically.

Calls are usually only problematic if they occur in an innermost loop. For example:

   do i=1,n
      c(i) = add(a(i),b(i))
   end do

   real function add(a,b)
      add = a + b
   end function add
   

As there are far fewer function calls, the following is much better:

   call add(a,b,c)

   subroutine add(a,b,c)
      do i=1,n
         c(i) = a(i) + b(i)
      end do
   end subroutine add
   



Ashley P. Willis