6.4 Iterative methods for linear algebraic equation systems

We will in this section seek to illustrate how classical iterative methods for linear algebraic systems of equations, such as Jacobi, Gauss-Seidel or SOR, may be applied for the numerical solution of linear, elliptical PDEs, whereas criteria for convergence of such iterative schemes can be seen in Section 7.3.2 of the Numeriske Beregninger.

Discretization of PDEs, in particular linear elliptic PDEs, will result in a system of linear, algebraic equations on the form $ \mathbf{ A\cdot x=b} $, which may be presented on component for a system of three equations as: $$ \begin{align} &a_{11}x_1+a_{12}x_2+a_{13}x_3=b_1 \tag{6.36}\\ &a_{21}x_1+a_{22}x_2+a_{23}x_3=b_2 \tag{6.37}\\ &a_{31}x_1+a_{32}x_2+a_{33}x_3=b_3 \tag{6.38} \end{align} $$

We rewrite (6.38) as an iterative scheme by normally referred to as:

Jacobi's method $$ \begin{align} &x_1^{m+1}=x_1^{m} +\frac{1}{a_{11}}[b_1-(a_{11}x_1^{m}+a_{12}x_2^{m}+a_{13}x_3^{m})] \tag{6.39}\\ &x_2^{m+1}=x_2^{m} +\frac{1}{a_{22}}[b_2-(a_{21}x_1^{m}+a_{22}x_2^{m}+a_{23}x_3^{m})] \tag{6.40}\\ &x_3^{m+1}=x_3^{m} +\frac{1}{a_{33}}[b_3-(a_{31}x_1^{m}+a_{32}x_2^{m}+a_{33}x_3^{m})] \tag{6.41} \end{align} $$

where we have introduced $ m $ as an iteration counter. In the case when $ x_i^{m} $ is a solution of (6.38), the expression in the brackets of (6.41), will be zero and thus $ x_i^{m+1}=x_i^{m} $, i.e. convergence is obtained. Whenever, $ x_i^{m} $ is not a solution of (6.38), the expression in the brackets of (6.41), will represent a correction to the previous guess at iteration $ m $.

A more compact representation of (6.41) may be used for a system of $ n $ equations: $$ \begin{align} &x_i^{m+1}=x_i^m+\delta x_i \tag{6.42}\\ &\delta x_i=\frac{1}{a_{ii}} \left[b_i-\sum^n_{j=1}a_{ij}x^m_j\right],\ i=1,2,\dots,n,\ m=0,1,\dots \tag{6.43} \end{align} $$

To start the iteration process one must chose a value $ x=x_0 $ at iteration $ m=0 $.

From (6.41) we see that Jacobi's method may be improved by substitution of $ x_1^{m+1} $ in the second equation and for $ x_1^{m+1} $ and $ x_2^{m+1} $ in the third equation, to yield:

Gauss-Seidel's method $$ \begin{align} &x_1^{m+1}=x_1^{m} +\frac{1}{a_{11}}[b_1-(a_{11}x_1^{m}+a_{12}x_2^{m}+a_{13}x_3^{m})] \tag{6.44}\\ &x_2^{m+1}=x_2^{m} +\frac{1}{a_{22}}[b_2-(a_{21}x_1^{m+1}+a_{22}x_2^{m}+a_{23}x_3^{m})] \tag{6.45}\\ &x_3^{m+1}=x_3^{m} +\frac{1}{a_{33}}[b_3-(a_{31}x_1^{m+1}+a_{32}x_2^{m+1}+a_{33}x_3^{m})] \tag{6.46} \end{align} $$

The successively improved Jacobi's method, is normally referred to as Gauss-Seidel's method.

The incremental change may be multiplied with a factor $ \omega $, to yield another variant of the iterative scheme:

SOR method $$ \begin{align} &x_1^{m+1}=x_1^{m} +\frac{\omega}{a_{11}}[b_1-(a_{11}x_1^{m}+a_{12}x_2^{m}+a_{13}x_3^{m})] \nonumber \\ &x_2^{m+1}=x_2^{m} +\frac{\omega}{a_{22}}[b_2-(a_{21}x_1^{m+1}+a_{22}x_2^{m}+a_{23}x_3^{m})] \tag{6.47} \\ &x_3^{m+1}=x_3^{m} +\frac{\omega}{a_{33}}[b_3-(a_{31}x_1^{m+1}+a_{32}x_2^{m+1}+a_{33}x_3^{m})] \nonumber \end{align} $$

A general version of (6.47) may be presented: $$ \begin{align} &x_i^{m+1}=x_i^m+\delta x_i \tag{6.48}\\ &\delta x_i=\frac{\omega}{a_{ii}} \left[b_i-\left(\sum^{i-1}_{k=1}a_{ik}x^{m+1}_k+\sum^n_{k=1}a_{ik}x^m_k\right)\right],\ i=1,2,\dots,n, \tag{6.49} \end{align} $$

The factor $ \omega $ is denoted the relaxation parameter or the relaxation factor. The method in (6.49) is commonly referred to as the successive over relaxation method when $ \omega>1 $ or simply abbreviated to the SOR method. With $ \omega=1 $ Gauss-Seidel's method is retrieved.

The relaxation factor $ \omega $ may be shown to be in the range $ (0, 2) $ for Laplace/Poisson equations, but naturally $ \omega >1 $ is most efficient. We will not use the SOR method is presented in (6.38), but rather use the difference equations directly.

Let us first consider Poisson's equation in two physical dimensions: $$ \begin{equation} \tag{6.50} \frac{\partial^2u}{\partial x^2}+\frac{\partial^2u}{\partial y^2}=f(x,y) \end{equation} $$

which after discretization with central differences, with $ \Delta x=\Delta y = h $ results in the following difference equation: $$ \begin{align} u_{i-1,j}+u_{i+1,j}+u_{i,j+1}+u_{i,j-1}-4u_{i,j}-h^2\cdot f_{i,j} = 0 \tag{6.51} \end{align} $$

We discretize our two-dimensional domain in the normal manner as: $$ \begin{align} &x_i=x_0+i\cdot h,\qquad i=0,1,2,\dots,n_{x} \nonumber \\ &y_j=y_0+j\cdot h, \qquad j=0,1,2,\dots,n_{y} \tag{6.52} \end{align} $$

where $ n_x $ and $ n_y $ denote the number of grid cells in the $ x- $ and $ y $-direction, respectively (see 6.3 Direct numerical solution).

By using the general SOR method (6.49) on (6.51) and (6.52) we get the following iterative scheme: $$ \begin{align} &u_{i,j}^{m+1}=u_{i,j}^m+\delta u_{i,j} \tag{6.53}\\ &\delta u_{i,j}=\frac{\omega}{4} \left[u^{m+1}_{i-1,j}+u^{m+1}_{i,j-1}+u^{m}_{i+1,j}+u^{m}_{i,j+1}-4u^{m}_{i,j}-h^2\cdot f_{i,j}\right] \tag{6.54} \end{align} $$

The scheme in (6.54) may be reformulated by introducing the residual $ R_{i,j} $ $$ \begin{equation} R_{i,j}=\left[u^{m+1}_{i-1,j}+u^{m+1}_{i,j-1}+u^{m}_{i+1,j}+u^{m}_{i,j+1}-4u^{m}_{i,j}-h^2\cdot f_{i,j}\right] \tag{6.55} \end{equation} $$

Note that the residual $ R_{i,j} $ is what is left when a non correct solution is plugged into to the difference equation (6.51) in a discrete point $ (i,j) $. By introducing the residual (6.55) into (6.54), the following iterative scheme may be obtained: $$ \begin{align} u_{i,j}^{m+1}&=u_{i,j}^m+\frac{\omega}{4}R_{i,j} \tag{6.56} \end{align} $$

We will now solve the example in Figure 76, 6.3 Direct numerical solution, with the iterative SOR-scheme in (6.54).

Figure 83: Rectangular domain as in Figure 76 but boundary nodes are unknown due to Neumann boundary conditions.

In Figure 83 we have introduced a new index notation as compared to the problem in Figure 76, and we do not explicitly account for symmetry.

The boundary values are imposed at the boundaries, whereas the rest of the unknown values are initialized to zero:

# Initialize T and impose boundary values
T = np.zeros_like(X)

T[-1,:] = Ttop
T[0,:] = Tbottom
T[:,0] = Tleft
T[:,-1] = Tright

tic=time.time()
omega = 1.5
for iteration in range(20):
    for j in range(1,ny+1):
        for i in range(1, nx + 1):
            R = (T[j,i-1]+T[j-1,i]+T[j,i+1]+T[j+1,i]-4.0*T[j,i])
            dT = 0.25*omega*R
            T[j,i]+=dT

toc=time.time()
print('GS solver time:',toc-tic)

# visualize solutions

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, T, rstride=1, cstride=1, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False)
ax.set_zlim(0, Ttop+10)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('T [$^o$C]')


nx=4
xticks=np.linspace(0.0,xmax,nx+1)
ax.set_xticks(xticks)

ny=8
yticks=np.linspace(0.0,ymax,ny+1)
ax.set_yticks(yticks)

nTicks=5
dT=int(Ttop/nTicks)
Tticklist=list(range(0,Ttop+1,dT))
ax.set_zticks(Tticklist)

#fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

In this first simple program we use $ \omega=1.5 $ and perform a fixed number of 20 iterations as shown in the program laplace_sor.py below with $ h = 0.25 $:

# src-ch7/laplace_Diriclhet2.py
import numpy as np
import scipy 
import scipy.linalg
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pylab as plt
import time
from math import sinh
from astropy.units import dT

#import matplotlib.pyplot as plt

# Change some default values to make plots more readable on the screen
LNWDT=2; FNT=15
plt.rcParams['lines.linewidth'] = LNWDT; plt.rcParams['font.size'] = FNT

# Set temperature at the top
Ttop=10
Tbottom=0.0
Tleft=0.0
Tright=0.0

xmax=1.0
ymax=1.5
 
# Set simulation parameters
#need hx=(1/nx)=hy=(1.5/ny)
Nx = 40
h=xmax/Nx
Ny = int(ymax/h)

nx = Nx-1
ny = Ny-1
n = (nx)*(ny) #number of unknowns

# surfaceplot:
x = np.linspace(0, xmax, Nx + 1)
y = np.linspace(0, ymax, Ny + 1)

X, Y = np.meshgrid(x, y)



# Initialize T and impose boundary values
T = np.zeros_like(X)

T[-1,:] = Ttop
T[0,:] = Tbottom
T[:,0] = Tleft
T[:,-1] = Tright

tic=time.time()
omega = 1.5
for iteration in range(20):
    for j in range(1,ny+1):
        for i in range(1, nx + 1):
            R = (T[j,i-1]+T[j-1,i]+T[j,i+1]+T[j+1,i]-4.0*T[j,i])
            dT = 0.25*omega*R
            T[j,i]+=dT

toc=time.time()
print('GS solver time:',toc-tic)

As opposed to the above scheme with a fixed number of iterations, we seek an iteration scheme with a proper stop criteria, reflecting that our solution approximates the solution with a certain accuracy. Additionally, we would like to find and relaxation parameter $ \omega $ which reduces the number of required iterations for a given accuracy. These topics will be addressed in the following section.

6.4.1 Stop criteria

Examples of equivalent stop criteria are:

$ \max(\delta T_{i,j}) < \varepsilon_a $
$ \max\left( \dfrac{\delta T_{i,j}}{T_{i,j}} \right) < \varepsilon_r $
The residual $ R_{i,j} $ (6.55) may also be used
Other alternatives:

$$ \begin{equation} \frac{1}{N} \sum_i\sum_j|\delta T_{i,j}| < tol_a,\ \frac{1}{N}\sum_i\sum_j\left|\frac{\delta T_{i,j}}{T_{i,j}} \right| < tol_r,\ |T_{i,j}|\neq 0 \tag{6.57} \end{equation} $$ where $ N=n_x \, n_y $ is the total number of unknowns. In (6.57) the residual may be used rather than $ \delta T_{i,j} $.

In the first expression we use an absolute tolerance, whereas a relative tolerance is suggested in the latter. We choose to use the following alternative for (6.57): $$ \begin{equation} \frac{\sum_i\sum_j\left|\delta T_{i,j}\right|}{\sum_i\sum_j\left|T_{i,j}\right|} < tol_r,\qquad \frac{\max\left(\left|\delta T_{i,j} \right|\right)}{\max\left(\left| T_{i,j} \right|\right)} < tol_r \tag{6.58} \end{equation} $$

The expression (6.58) represents some kind of an average relative stop criteria, as we sum over all computational grid points in these formulas.

From Figure 84, we observe that the number of iterations is a function of both the relaxation parameter $ \omega $ and the grid size $ h $.

Figure 84: Number of iterations as a function of $ \omega $ and $ h $ with $ tol_r=10^{-5} $.

A more generic program for the Poisson problem (6.50) with a stop criteria $ \frac{\max\left(\left|\delta T_{i,j} \right|\right)}{\max\left(\left| T_{i,j} \right|\right)} < tol_r $ and variable grid size $ h $ is included below. As for the previous code with a fixed number of iterations, both a Jacobi-scheme with array-slicing and a Gauss-Seidel scheme is included for comparison.

# src-ch7/laplace_Diriclhet2.py
import numpy as np
import scipy 
import scipy.linalg
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pylab as plt
import time
from math import sinh
from astropy.units import dT

#import matplotlib.pyplot as plt

# Change some default values to make plots more readable on the screen
LNWDT=2; FNT=15
plt.rcParams['lines.linewidth'] = LNWDT; plt.rcParams['font.size'] = FNT

# Set temperature at the top
Ttop=10
Tbottom=0.0
Tleft=0.0
Tright=0.0

xmax=1.0
ymax=1.5
 
# Set simulation parameters
#need hx=(1/nx)=hy=(1.5/ny)
Nx = 20
h=xmax/Nx
Ny = int(ymax/h)

nx = Nx-1
ny = Ny-1
n = (nx)*(ny) #number of unknowns

# surfaceplot:
x = np.linspace(0, xmax, Nx + 1)
y = np.linspace(0, ymax, Ny + 1)

X, Y = np.meshgrid(x, y)

T = np.zeros_like(X)

# set the imposed boudary values
T[-1,:] = Ttop
T[0,:] = Tbottom
T[:,0] = Tleft
T[:,-1] = Tright

T2 = T.copy()

reltol=1.0e-3

omega = 1.5
iteration = 0
rel_res=1.0

# Gauss-Seidel iterative solution
tic=time.time()
while (rel_res > reltol):
    dTmax=0.0
    for j in range(1,ny+1):
        for i in range(1, nx + 1):
            R = (T[j,i-1]+T[j-1,i]+T[j,i+1]+T[j+1,i]-4.0*T[j,i])
            dT = 0.25*omega*R
            T[j,i]+=dT
            dTmax=np.max([np.abs(dT),dTmax])
        
    rel_res=dTmax/np.max(np.abs(T))
    iteration+=1
    
toc=time.time()
print("Gauss-Seidel solver time:\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))


iteration = 0
rel_res=1.0
# Jacobi iterative solution
tic=time.time()
while (rel_res > reltol):
    R2 = (T2[1:-1,0:-2]+T2[0:-2,1:-1]+T2[1:-1,2:]+T2[2:,1:-1]-4.0*T2[1:-1,1:-1])
    dT2 = 0.25*R2
    T2[1:-1,1:-1]+=dT2
    rel_res=np.max(dT2)/np.max(T2)
    iteration+=1    

toc=time.time()
print("Jacobi solver time:\t\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))

6.4.2 Optimal relaxation parameter

By optimal, we mean the relaxation parameter value that results in the lowest possible iteration number for $ \omega $ hold constant throughout the computation. For Laplace and Poisson equations in a rectangular domain it is possible to calculate an optimal $ \omega $, which we will call as theoretically optimal.

Let $ L_x $ and $ L_y $ be the extent of the rectangular domain in the $ x $- and $ y $-directions. Taking the same space discretization $ h $ in both directions, we set:\\ $ n_x=\dfrac{L_x}{h},\ n_y=\dfrac{L_y}{h} $, with $ n_x $ and $ n_y $ being the number of intervals in the $ x $- and $ y $-directions, respectively. $ n_x $ and $ n_y $ must be integers. The theoretical optimal $ \omega $ is then given by: $$ \begin{align} &\rho = \frac{1}{2}[\cos(\pi/n_x)+\cos(\pi/n_y)] \tag{6.59}\\ &\omega = \frac{2}{1+\sqrt{1-\rho^2}} \tag{6.60} \end{align} $$

If the interval length $ h $ is different in $ x $- and $ y $-directions with $ h=h_x $ in $ x $-direction and $ h=h_y $ in $ y $-direction, instead of (6.60), the following holds: $$ \begin{equation} \rho = \frac{\cos(\pi/n_x)+(h_x/h_y)^2\cdot \cos(\pi/n_y)}{1+(h_x/h_y)^2} \tag{6.61} \end{equation} $$

Moreover, if the domain is not rectangular, one can use Garabedian's estimate: $$ \begin{equation} \omega = \frac{2}{1+3.014\cdot h / \sqrt{A}}, \tag{6.62} \end{equation} $$ where $ A $ is the area of the domain.

Let us compute some numerical values for these formulas for the case in which $ L_x=1,\ L_y=1.5 $ and $ A=L_x\cdot L_y=1.5 $.

h	(6.60)	(6.62)
0.25	1.24	1.24
0.125	1.51	1.53
0.0625	1.72	1.74
0.03125	1.85	1.86

We see that Garabedian's estimate is well in line with the theoretically exact values in this case. Results reported in Figure 84 are also in good agreement with the values in this table.

6.4.3 Example using SOR

We now solve the temperature problem shown in Figure 79 (se 6.3.1 Neumann boundary conditions). The numbering convention used here is depicted in Figure 85.

Figure 85: Ghost-cells are used to implement Neumann boundary conditions.

Here we use ghost points as indicated by the dashed lines. For $ T_{1,1} $ we get: $$ \begin{equation} T_{1,1}=\frac{1}{4}(T_{1,2}+T_{1,2}+T_{2,1}+T_{2,1})=\frac{1}{2}(T_{1,2}+T_{2,1}) \tag{6.63} \end{equation} $$

The calculation is started by iterating along $ y=0 $, starting with $ T_{2,1} $. Then iteration along $ x=0 $ follows, starting with $ T_{1,2} $. Afterwards we iterate in a double loop over inner points. Finally, $ T_{1,1} $ is computed from (6.63). The algorithm is shown in code lapsor2 in the next page. We have used the stopping criteria $ \frac{\sum_i\sum_j\left|\delta T_{i,j}\right|}{\sum_i\sum_j\left|T_{i,j}\right|} < tol_r $ and optimal $ \omega $ from (6.60) and denote $ T=0.5 $ as initial guess for the entire domain, expect where boundary conditions are prescribed. Accuracy is as in the print out for lap2v3 in 6.3.1 Neumann boundary conditions.

  % program lapsor2
  clear
  net = 1;
  h = 0.25;
  hn = h/2^(net -1);
  nx = 1/hn; ny = nx;
  imax = nx + 1; % points in x-direction
  jmax = ny + 1; % points in y-direction
  T = 0.5*ones(imax,jmax); % temperatures
  % --- Compute optimal omega ---
  ro = cos(pi/nx);
  omega = 2/(1 + sqrt(1 - ro^2));
  T(1:imax,jmax) = 1;  % boundary values along y = 1
  T(imax,1:jmax-1) = 0;% boundary values along x = 1
  reltol = 1.0e-5; % relative iteration error
  relres = 1.0; it = 0;
  % --- Start iteration ---
  while relres > reltol
  it = it + 1;
  Tsum = 0.0; dTsum = 0.0;
  % --- boundary values along y = 0 ---
  for i = 2: imax - 1
  resid = 2*T(i,2) + T(i-1,1) + T(i+1,1) - 4*T(i,1);
  dT = 0.25*omega*resid;
  dTsum = dTsum + abs(dT);
  T(i,1) = T(i,1) + dT;
  Tsum = Tsum + abs(T(i,1));
  end
  % --- boundary values along x = 0 ---
  for j = 2: jmax - 1
  resid = 2*T(2,j) + T(1,j-1) + T(1,j+1) - 4*T(1,j);
  dT = 0.25*omega*resid;
  dTsum = dTsum + abs(dT);
  T(1,j) = T(1,j) + dT;
  Tsum = Tsum + abs(T(1,j));
  end
  for i = 2: imax-1
  for j = 2: jmax-1
  resid = T(i-1,j) + T(i,j-1) + T(i+1,j) + T(i,j+1)-4*T(i,j);
  dT = 0.25*omega*resid;
  dTsum = dTsum + abs(dT);
  T(i,j) = T(i,j) + dT;
  Tsum = Tsum + abs(T(i,j));
  end
  end
  T(1,1) = 0.5*(T(2,1) + T(1,2));
  relres = dTsum/Tsum;
  end

6.4.4 Initial guess and boundary conditions

We expect faster convergence if we use initial guesses that are close to the problem's solution. This is typical of nonlinear equations, while we are more free to choose initial guesses when we solve linear equations without exceeding the convergence rate. For example, for the temperature problem in Figure 76, there is little difference in number of iterations if we start the iteration by choosing $ T=0 $ in the entire domain or if we start with $ T=1004 $. The optimal $ \omega $ for this case is also independent of the starting values. The situation is completely different for the case in Figure 79. Here we also solve a linear equation but we have more complicated boundary conditions, where we prescribe the temperature along two sides of the domain (Dirichlet conditions) and the derivative of the temperature along the two other sides (Neumann conditions) . In addition, the temperature is discontinuous in the corner x = 1, y = 1. In the corner x = 0, y = 0 the correct solution is $ T=0.5 $, as shown by the analytical solution. If we exclude $ T=0.5 $ as initial guess throughout the domain, we get fast convergence with optimal $ \omega $ equal to the theoretical value obtained using (6.60). If we deviate slightly from $ T=0.5 $ as the initial guess, then the optimal $ \omega $ is no longer the theoretical optimal. The situation is as shown in Figure 86 below.

Figure 86: Effect of initial guesses on the number of iterations for a problem with sub-optimal $ \omega $.

The table below shows the theoretical optimal $ \omega $ obtained with (6.60) and (6.62)

h	(6.60)	(6.62)
0.25	1.17	1.14
0.125	1.45	1.45
0.0625	1.67	1.68
0.03125	1.82	1.83

We see that the values reported in the table match values shown in Figure 86 when the initial guess for the iterative process is 0.5.

6.4.5 Example: A non-linear elliptic PDE

In this example we will solve a non-linear elliptic Poisson equation for a square 2D-domain (see Figure 87) given by: $$ \begin{align} \frac{\partial^2u}{\partial x^2}+\frac{\partial^2u}{\partial y^2}+ u^2=-1 \tag{6.64} \end{align} $$

with Dirichlet boundary conditions, i.e prescribed values $ u=0 $ on all boundaries (see Figure 87).

Figure 87: Solution domain and boundary conditions for a non-linear Poisson equation.

The PDE in (6.64) is only weakly non-linear, so-called semi-linear, but the approach for how to solve a non-linear elliptic PDE is still illustrated.

We discretize (6.64) with central differences over the domain in Figure 87 by a constant stepsize $ h $ in both physical directions and get the following system of equations when the source term on the right hands side has been moved to the left hand side: $$ \begin{align*} &\frac{1}{h^2}[u_{i+1,j}+u_{i-1,j}+u_{i,j-1}+u_{i,j`1}-4u_{i,j}]+u_{i,j}^2+1=0 \end{align*} $$

or equivalently with by introducing the function $ f_{i,j} $: $$ \begin{align} f_{i,j}=u_{i+1,j}+u_{i-1,j}+u_{i,j-1}+u_{i,j+1}-4u_{i,j}+h^2(u_{i,j}^2+1)=0 \tag{6.65} \end{align} $$

with $ x_i=h\cdot(i-1),\ i=1,2,\dots $ and $ y_j=h\cdot(j-1),\ j=1,2,\dots $. A computational mesh for $ h=0.25 $ is illustrated in Figure 88.

Figure 88: Computational mesh for the problem in Figure 87.

We will solve the non-linear function in (6.65) with Newton's method by changing each variable one at a time. In this case the iteration process becomes: $$ \begin{align} &u_{i,j}^{m+1}=u_{i,j}^{m}+\delta u_{i,j} \tag{6.66}\\ &\delta u_{i,j}= -\omega \frac{f(u_{k,l})}{\dfrac{\partial f(u_{k,l})}{\partial u_{i,j}}} \tag{6.67} \end{align} $$

where: $$ \begin{align} &u_{k,l}=u_{k,l}^{m+1} \text{ for } k < i,\ l < j \tag{6.68}\\ &u_{k,l}=u_{k,l}^m \text{ ellers} \tag{6.69} \end{align} $$ and $$ \begin{align} \frac{\partial f}{\partial u_{i,j}}=-4+2h^2\cdot u_{i,j} \qquad \text{and} \qquad \delta u_{i,j}=\omega \frac{f}{4-2h^2u_{i,j}} \tag{6.70} \end{align} $$

We have implemented (6.65) and (6.67) to (6.70) in the python code nonlin_poisson below:

# Python Gauss-Seidel method
tic=time.time()
while (rel_res > reltol):
    du_max=0.0
    for j in range(1,ny+1):
        for i in range(1, nx + 1):
            R = (U[j,i-1]+U[j-1,i]+U[j,i+1]+U[j+1,i]-4.0*U[j,i]) + h**2*(U[j,i]**2+1.0)
            df=4-2*h**2*U[j,i]
            dU = omega*R/df
            U[j,i]+=dU
            du_max=np.max([np.abs(dU),du_max])

         
    rel_res=du_max/np.max(np.abs(U))

    iteration+=1
     
toc=time.time()
print("Python Gauss-Seidel CPU-time:\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))

In this implementation we have used the following stop criterium: $$ \begin{align} \frac{\max \delta{u}_{i,j}}{\max u_{i,j}} \leq \mathrm{tol}_r \tag{6.71} \end{align} $$ and initialize the solution in the field (excluding boundaries) with $ u=0 $.

We have also used (6.60) to estimate an optimal $ \omega_{est} $. In the table below we compare the estimated optimal $ \omega_{est} $ with the real optimal $ \omega_{opt} $ in case of an initial field of $ u=0 $.

$ h $	$ \omega_{est} $	$ \omega_{opt} $
0.25	1.17	1.19
0.125	1.45	1.46
0.0625	1.67	1.69
0.03125	1.82	1.83

We observe relatively good agreement between the two $ \omega $-values for a range of grid sizes $ h $, even though the PDE is weakly non-linear.

As the Gauss-Seidel algorithm above involves a triple-loop (the iterative while-construct, pluss one loop in each physical direction), the naive python implementation above must be expected to be computationally expensive.

For comparison we have also implemented another solution to the problem by making use of numpy's array slicing capabilities:

# Jacobi iterative solution
tic=time.time()
while (rel_res > reltol):
    R2 = (U2[1:-1,0:-2]+U2[0:-2,1:-1]+U2[1:-1,2:]+U2[2:,1:-1]-4.0*U2[1:-1,1:-1]) + h**2*(U2[1:-1,1:-1]**2+1.0)
    df=4-2*h**2*U2[1:-1,1:-1]
    dU2 = R2/df
    U2[1:-1,1:-1]+=dU2
    rel_res=np.max(dU2)/np.max(U2)
    iteration+=1    
 
toc=time.time()
print("Jacobi CPU-time:\t\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))

Note with this implementation all values on the right hand side are at the previous iteration, and thus the method must me denoted a Jacobian algorithm.

Finally, we have implemented a third method the Gauss-Seidel method (6.67) with Cython. Cython is an optimizing static compiler (based on Pyrex) for both the Python programming language and the extended Cython programming language. The ambition is to makes the writing of computationally superior C extensions for Python as easy as Python itself.

The expensive triple loop is implemented in a typed Cython function which looks very much like the Python implementation, save for the type declarations.

import cython
cimport cython

import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)


cpdef gauss(np.ndarray[DTYPE_t, ndim=2] U, double reltol, double h, double omega):
    cdef Py_ssize_t i, j, it
    cdef double rel_res, dU_max, df, dU, R

    cdef unsigned int rows = U.shape[0]
    cdef unsigned int cols = U.shape[1]
    
    it=0
    rel_res=1.0
        
    itmax=100
    
    while ((rel_res>reltol) and (it<=itmax)):
        dU_max=0.0
        for j in range(1,rows-2):
                for i in range(1,cols-2):
                    R = (U[j,i-1]+U[j-1,i]+U[j,i+1]+U[j+1,i]-4.0*U[j,i]) + h**2*(U[j,i]**2+1.0)
                    df=4.0-2*h**2*U[j,i]
                    dU =  omega*R/df
                    U[j,i]+=dU
                    dU_max=np.max([np.abs(dU),dU_max])
                   
        rel_res=dU_max/np.max(np.abs(U[:,:]))
#         print 'rel_res', rel_res
        it+=1
    if (it>=itmax): print 'Terminated after max iterations'    
    return U, rel_res, it

After compilation the Cython module is easily imported into our python-code which allows for comparison with the methods above as illustrated in the code below:

# Python Gauss-Seidel method
tic=time.time()
while (rel_res > reltol):
    du_max=0.0
    for j in range(1,ny+1):
        for i in range(1, nx + 1):
            R = (U[j,i-1]+U[j-1,i]+U[j,i+1]+U[j+1,i]-4.0*U[j,i]) + h**2*(U[j,i]**2+1.0)
            df=4-2*h**2*U[j,i]
            dU = omega*R/df
            U[j,i]+=dU
            du_max=np.max([np.abs(dU),du_max])

         
    rel_res=du_max/np.max(np.abs(U))

    iteration+=1
     
toc=time.time()
print("Python Gauss-Seidel CPU-time:\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))
 
 
iteration = 0
rel_res=1.0

# Second method
# Jacobi iterative solution
tic=time.time()
while (rel_res > reltol):
    R2 = (U2[1:-1,0:-2]+U2[0:-2,1:-1]+U2[1:-1,2:]+U2[2:,1:-1]-4.0*U2[1:-1,1:-1]) + h**2*(U2[1:-1,1:-1]**2+1.0)
    df=4-2*h**2*U2[1:-1,1:-1]
    dU2 = R2/df
    U2[1:-1,1:-1]+=dU2
    rel_res=np.max(dU2)/np.max(U2)
    iteration+=1    
 
toc=time.time()
print("Jacobi CPU-time:\t\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,iteration))

# Third method
# Cython Gauss-Seidel method 
rel_res=1.0
tic=time.time() 
U3, relreturn, itsused=gs.gauss(U3,reltol,h, omega)  
toc=time.time()
print("Cython Gauss-Seidel CPU-time:\t{0:0.2f}. \t Nr. iterations {1}".format(toc-tic,itsused))

By running the code we get the following results for comparison:

omega=1.85
Python Gauss-Seidel CPU-time:	1.62. 	 Nr. iterations 56
Jacobi CPU-time:		0.04. 	 Nr. iterations 492
Cython Gauss-Seidel CPU-time:	0.97. 	 Nr. iterations 56

which illustrates the extreme efficiency whenever a numerical scheme may be expressed by means of numpy-array-slicing. Note that the numpy-array-slicing Jacobi scheme converge slower than the Gauss-Seidel scheme in terms of iterations, and need approximately 10 times as many iterations as the Gauss-seidel algorithm. But even in this situation the CPU-speed of the Jacobi scheme with array-slicing is approximately 40 times faster than the Gauss-Seidel scheme in python. We also observe that the Cython implementation Gauss-Seidel scheme is approximately 1.7 times faster than the python counter part, but not by far as fast as the Jacobi scheme with array-slicing, which is approximately 24 times faster.

emfs /src-ch7/ #python #package nonlin_poisson_sor.py  @ git@lrhgit/tkt4140/src/src-ch7/nonlin_poisson_sor.py; gauss_seidel_sor.pyx  @ git@lrhgit/tkt4140/src/src-ch7/gauss_seidel_sor.pyx gauss_seidel_sor.so  @ git@lrhgit/tkt4140/src/src-ch7/gauss_seidel_sor.so

Exercise 9: Symmetric solution

Prove that the analytical solution of the temperature field $ T(x,y) $ in (6.26) is symmetric around $ x = 0.5 $.

Exercise 10: Stop criteria for the Poisson equation

Implement the various stop criteria outlined in 6.4.1 Stop criteria for the Possion equation in two dimensions (6.50).

\( h \)	\( \omega_{est} \)	\( \omega_{opt} \)
0.25	1.17	1.19
0.125	1.45	1.46
0.0625	1.67	1.69
0.03125	1.82	1.83