Iterative method for solving linear system of equations
Understand basics of iterative methods, analysis of error and convergence of solution
- Introduction
- First let's consider a single variable case
- Extrapolate above to multiple variable (matrix) case
- References
Introduction
Many of the articles and books employ matrix decomposition, in particular use eigendecomposition for the proofs of convergence of iterative methods like Jacobi method. It is not clear or immediately obvious how and why those methods work and what is the connection. So, I wanted to start from fundamentals and introduce these tools progressively so we can better understand these techniques and apply to novel problems.
There are various reasons that are outside the scope of this article to discuss why we want to solve equations using iterative methods.
I first use a single variable case as a warmup and introudce the underlying idea. Then expand the techniques to multiple variables case.
First let's consider a single variable case
Given $3x = 6$, find $x$. We start with some random initial value, say $x_0$ for $x$ and improve it iteratively until the difference between two successive values is below a threshold. So, we need an equation that gives new value for $x$ from an old value. Let's try this:
Split $3$ as $M+B$
$(M+B)x = 6$
$Mx = -Bx + 6$
This splitting allows us to start with a random $x_0$, compute $-Bx_0+6$ which is $Mx_1$, extract $x_1 = \frac{-Bx_0 + 6}{M}$ from it and hopefully, the extracted $x_1$ is better than initial random selection. Now we can continue this and perform the same step with the new value of x again:
$$Mx_2 = -Bx_1+6 = -B( \frac{(-Bx_0+6)}{M} ) + 6$$
So, the pattern seems to be:
x = -B (-Bx/M +6/M) + 6
= B^2 x/M - 6B/M + 6
Let's seprately group the coefficients:
Start with $x = x_0$
$Mx_1 = -Bx_0 + 6$
or
$x_1 = (\frac{-B}{M}) x_0 + (\frac{6}{M})$
Let
-B/M = P
6/M = z
Continuing,
$\begin{aligned} x_{i+1} &= P * x_{i} + z \\ &= P * [ P * x_{i-1} + z ] + z \\ &= P^2 x_{i-1} + P z + z \end{aligned}$
Error Analysis:
Let the error in any step be distance from true value (which is not known and represented by $x$ in below equations):
$e_0 = x_0 - x$
$e_1 = x_1 - x$
...
$e_i = x_i - x$
Or
$x_i = e_i + x$
And so on. Note that this is analytical purpose only and error is not used to drive the algorithm in calculating approximation (error depends on the true value which we do not know - so can't use it to drive/stop algorithm!).
$\begin{aligned} x_{i+1} &= P * x_{i} + z &= P (x + e_{i}) + z &= P x + z + P e_{i} &= x + P e_{i} \end{aligned}$
This is because Px + z = x (note that here, $x$ is the actual value, so there is no confusion about $x$ appearing on LHS and RHS).
So,
$\begin{aligned} e_{i+1} = x_{i+1} - x = P e_i + x - x = P e_{i} \end{aligned}$
So, error starts decreasing if P < 1 and positive. Even if P < 1, if it is negative, convergence is questionable.
If we choose values as below, it satisfies the need:
B = -1
M = 4
(verify that B+M = 3)
P = 1/4
z = 6/4
Convergence of solution:
How about convergence of $x_i$ itself?
$\begin{aligned} x_{i+1} - x &= P * e_i \\ &= P^{i+1} * e_0 \\ &= P^{i+1} * (x_0 - x) \end{aligned}$
We see that the difference between actual value and the iteration value of x decreases as we contine the iteration.
Let's test this some code:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
B = -1
M = 4
P = 0.25
z = 1.5
x0 = 1.1 # random starting point
# Note that x (the actual value) is used only for error analysis
x = 2
e0 = x0-x
el = [e0]
ec = e0
xl = [x0]
xc = x0
for i in range(10):
en = P * ec
el.append(en)
ec = en
xn = P * xc + z
xl.append(xn)
xc = xn
fig = plt.figure(figsize=(10, 7))
err_ax = fig.add_subplot(121)
sol_ax = fig.add_subplot(122)
err_ax.set_title('Error')
err_ax.plot(np.arange(len(el)), el)
sol_ax.set_title('Solution')
sol_ax.plot(np.arange(len(xl)), xl)
We clearly see that error is going to zero and $x_i$ is converging to 2.0.
Now let's see how we can employ some of the same concepts to multiple variables case.
Extrapolate above to multiple variable (matrix) case
2x+y = 7
x+3y = 8
$\begin{aligned} \begin{bmatrix} 2 && 1 \\ 1 && 3 \\ \end{bmatrix} * \begin{bmatrix} x \\ y \\ \end{bmatrix} &= \begin{bmatrix} 7 \\ 8 \\ \end{bmatrix} \end{aligned}$
A x = b
A is square and in general may be or may not be invertible.
Let's use the split trick we employed earlier for single variable case.
Split A = M + B
(M+B)x = b
Mx = -Bx + b
Let's use an M such that it is invertible (and easily invertible)
$x = M^{-1}(-Bx+b)$
Start with a random or zero vector for initial value, $x = x_0 = \begin{bmatrix}0 \\ 0\end{bmatrix}$
$\begin{aligned} x_1 &= M^{-1}(-Bx_0+b) \\ &= -M^{-1} B x_0 + M^{-1} b \end{aligned}$
$\begin{aligned} x_2 &= M^{-1} (-Bx_1+b) \\ &= -M^{-1} B x_1 + M^{-1} b \\ &= -M.T B ( -M.T B x0 + M^{-1} b ) + M^{-1} b \\ &= M^{-1} B M^{-1} B x0 - M^{-1} B M^{-1} b + M^{-1} b \end{aligned}$
The -ve term is concerning, it is not clear if the convergence happens and even if so it is tough to analyze the convergence.
Let's try something different, may be we could do subtraction instead of addition. Let's split like this:
A = M - N
Mx - Nx = b
$x = M^{-1} N x + M^{-1} b$
$x_1 = M^{-1} N x_0 + M^{-1} b$
Let $M^{-1} N = P$ and $M^{-1} b = z$ for ease.
$x_1 = P x_0 + z$
$\begin{aligned} x_2 = P (x_1) + z = P (P x_0 + z) + z = P^2 x_0 + P z + z \end{aligned}$
Note: BTW, this method of splitting is known as Jacobi method.
Error analysis
Let's do an error analysis similar to how we did for single variable case.
$e_0 = x_0-x$
or
$e_i = x_i - x$
$\begin{aligned} x_{i+1} &= P x_i + z &= P (e_i + x) + z &= P e_i + P x + z &= P e_i + x \end{aligned}$
$\begin{aligned} e_{i+1} &= x_{i+1} - x &= P e_i &= P^{i+1} e_0 \end{aligned}$
So, we want $P^k -> 0$ in the limit for error to converge to zero.
But, $P^k$ -> 0 iff $||P^k||$ -> 0, iff $\rho(P)$ < 1. Here $\rho$ is the spectral radius.
Now, how do we find M and N?
Since we want M to be easily invertible, it would be great if it is a diagonal matrix. May be, we could just take diagonal elements of A and shove in M. Remaining ones go into -N. Then A = M - N.
Then $P = M^{-1} N = M^{-1} (M-A) = I - M^{-1} A $
Or $M^{-1} A = I - P $.
From linear algebra,
$ ||e_{i+1}|| = || P e_i || \le ||P||_{p} ||e_i||$ for any norm p. If $||P||_{p} \lt 1$, the error keeps decreasing. If A is diagonally dominated, then the ‘iteration matrix’ P has an $\infty$-norm that is strictly less than 1. Note that for some norms, p, $||P||_{p}$ might be less than 1 and may be greater than 1 for some other norms. So, this is only sufficient condition. The necessary condition is that $\rho(P) \lt 1$.
Let's try on the above example through code.
import numpy as np
from numpy import linalg as LA
# works only for diagonally dominant A (equivalent to convergence of P)
# and with spectral radius of P < 1
# x is used for error analysis only
def solve(A, b, x, num_iter = 60):
variable_count = len(x)
x0 = np.zeros((variable_count, 1))
e0 = x0-x
e = [e0]
ec = e0
xl = [x0]
xc = x0
M = np.diag(np.diag(A))
N = M - A
P = LA.inv(M).dot(N)
z = LA.inv(M).dot(b)
eig = LA.eig(P)
print("Eigen stuff of P")
print(eig)
print("Initial")
print("M inverse: ")
print(LA.inv(M))
print("P (inv(M)*N): ")
print(P)
print ("z (inv(M) b)")
print(z)
for i in range(num_iter):
en = P.dot(ec)
e.append(en)
ec = en
xn = P.dot(xc) + z
xl.append(xn)
xc = xn
norm = LA.norm(en, ord=2)
if (norm < 1e-5):
break
return (e, xl)
import numpy as np
from numpy import linalg as LA
A = np.array([[2, 1], [1, 3]])
b = np.array([[7], [8]])
x = np.array([[1], [1]])
el, xl = solve(A, b, x)
print("Result after last iteration")
print(xl[-1])
import numpy as np
from numpy import linalg as LA
A = np.array([[2, -1, 0, 0], [-1, 2, -1, 0], [0, -1, 2, -1], [0, 0, -1, 2]])
b = np.array([[25], [-24], [21], [-15]])
x = np.array([[11], [-3], [7], [-4]])
el, xl = solve(A, b, x, 60)
print("Result after last iteration")
print(xl[-1])
import numpy as np
from numpy import linalg as LA
# Above method does not work for this as A is not diagonally dominant matrix
A = np.array([[2, 3], [3, 2]])
b = np.array([[7], [8]])
x = np.array([[2], [1]])
el, xl = solve(A, b, x)
print("Result after last iteration")
print(xl[-1])
A = np.array([[4, 2, 3], [3, -5, 2], [-2, 3, 8]])
b = np.array([[8], [-14], [27]])
x = np.array([[-1], [3], [2]])
e, xl = solve(A, b, x)
print("Result after last iteration")
print(xl[-1])
Convergence Analysis of error continued
What if $e_0$ = an eigenvector of P, say $v_0$ with eigenvalue $\lambda_0$?
$\begin{aligned} e_{i+1} &= P^k v_0 \\ &= P^{k-1} P v_0 \\ &= P^{k-1} \lambda_0 v_0 \\ &= \lambda_0 P^{k-1} v_0 \\ &= \lambda^{k}_0 v_0 \end{aligned}$
If $ 0 \le \lambda_0 \lt 1$, error goes to zero.
What if $e_0$ is not an eigenvector?
It can be written as a linear combination of eigenbasis (orthonormal eigenvectors of P), say Q.
Let $v_1, v_2, \dots, v_n$ be the orthonormal eigenvectors of P. Let $\lambda_1, \lambda_2, \dots, \lambda_n$ be corresponding eigenvalues.
$Q = [v_1 \dots v_n]$
$e_0 = c_1 v_1 + c_2 v_2 + \dots + c_n v_n$
$e_{i+1} = P^{i+1} e_0 = P^{i+1} (c_1 v_1 + c_2 v_2 + \dots + c_n v_n) = c_1 \lambda_1^{i+1} v_1 + c_2 \lambda_2^{i+1} v_2 + \dots + c_n \lambda_n^{i+1} v_n$.
Iff $|\lambda_i| \lt 1$ for all i, $e_k -> 0$.
Also,
Spectral radius, $\rho = max_i(|\lambda_i|)$.
$||e_k|| -> ||c_j \lambda_j^k v_j||$ ~ $\rho^k$ for some j and reaches zero if $\rho \lt 1$.
Using matrices:
P = $QDQ^{-1}$ where D is the diagonal matrix of eigenvalues.
$e_{i+1} = (QDQ^{-1})^{i+1}e_0 = QD^{i+1}Q^{-1}e_0$. So, if eigenvalues < 1, error goes to zero.
If P does not have full matrix, you can use Jordan form of P as matrix similarity extends to Jordan form as well (see this reference for example: https://math.la.asu.edu/~gardner/iter.pdf).
Convergence Analysis of solution
How about convergence of $x_{i+1}$?
Let $x^*$ be the (unknown) solution. Then it satisfies $x^* = Px^* + z$ by definition.
Consider,
$x_{i+1} - x^* = (Px_i+z)-(Px^*+z) = P(x_i-x^*) = P((Px_{i-1}+z) - (Px^* + z)) = P^2(x_{i-1}-x^*) = \dots = P^{i+1}(x_0-x^*)$.
So, if the $\rho(P) \lt 1$, then $||x_{i+1} - x^*|| = ||P^{i+1}(x_0-x^*)||$ ~ $0$ or $||x_{i+1}||$ ~ $||x^*||$.
from numpy.linalg import matrix_power
A = np.array([[4, 2, 3], [3, -5, 2], [-2, 3, 8]])
b = np.array([[8], [-14], [27]])
x_solution = np.array([[-1], [3], [2]])
x_0 = np.zeros((3, 1))
M = np.diag(np.diag(A))
N = M - A
P = LA.inv(M).dot(N)
# consider 11th iteration
i = 11
# grab x_11 from the output above
x_11 = np.array([[-0.66699081], [ 2.85785307], [ 2.09930262]])
lhs = x_11 - x_solution
rhs = (matrix_power(P, i+1)).dot(x_0-x_solution)
print(lhs)
print(rhs)
Now you could see that, the rate at which each component of $x_{i+1}$ converges depends on the corresponding eigenvecgtor of $P$. For,
$x_{i+1} - x^* = QD^{i+1}Q^{-1} (x_{0} - {x^*})$
Let's simplify as we are using $x_0 = 0$ most of the time.
$x_{i+1} = x^* - QD^{i+1}Q^{-1} {x^*}$
Let's try to study how individual component of $x_{i+1}$ progresses towards the corresponding component in the solution as we iterate.
Let $q_{(j)}$ be j-th row of Q and $q'_{(j)}$ be j-th row of $Q^{-1}$. Then after mundane manipulations of above equation, we see that
${x_{i+1}}_{(j)} = q_{(j)} \sum^{n}_{k=1} {\lambda_k}^{i+1} q'_k x^*$
This is way too complex to see what is going on - there is a light at the end of tunnel, though - we just need to change our frame of reference to that of eigenbasis of P.
Let
$Q^{-1}v = w$
Rewrite above using w. First multiply both sides with $Q^{-1}$:
$Q^{-1}x_{i+1} = Q^{-1}x^* - D^{i+1}Q^{-1} {x^*}$
So,
$w_{i+1} = w^* - D^{i+1} w^*$
Now, look at the j-th component of $w_{i+1}$:
${w_{i+1}}_{(j)} = {w^*}_{(j)} - {\lambda_j}^{i+1} {w^*}_{(j)} = (1-{\lambda_j}^{i+1}) {w^*}_{(j)}$.
Now, we clearly see that each component of $w$ takes its own progression towards meeting $w^*$ as long as $\lambda_j \lt 1$. Also, note that it is in closed form, you do not need to loop to find value at any i unlike for $x_{i+1}$ which you cannot compute without looping.
Let's test above theory. First, a couple of utilities:
def analyze(A, b, x, num_iter = 60):
variable_count = len(x)
x0 = np.zeros((variable_count, 1))
e0 = x0-x
e = [e0]
ec = e0
xl = [x0]
xc = x0
M = np.diag(np.diag(A))
N = M - A
P = LA.inv(M).dot(N)
z = LA.inv(M).dot(b)
eigvals, eigvecs = LA.eig(P)
D = np.diag(eigvals)
Q = np.array(eigvecs)
Qinv = LA.inv(Q)
wstar = Qinv.dot(x)
d0 = Qinv.dot(e0)
dc = d0
dl = [d0]
print("wstar")
print(wstar)
print("-----------")
wl = []
Dc = D
for i in range(num_iter):
en = P.dot(ec)
e.append(en)
ec = en
dn = Qinv.dot(en)
dl.append(dn)
dc = dn
xn = P.dot(xc) + z
xl.append(xn)
wl.append(wstar - Dc.dot(wstar))
Dc = Dc * D
xc = xn
return (e, dl, xl, np.array(wl))
import matplotlib.pyplot as plt
def plot_points(points):
plt.scatter(np.real(points), np.imag(points))
plt.show()
x = np.real(points)
y = np.imag(points)
n = [i for i in range(len(x))]
fig, ax = plt.subplots(figsize=(20,10))
ax.scatter(x, y)
for i, txt in enumerate(n):
ax.annotate(txt, (x[i], y[i]))
plt.xticks(np.arange(min(x)-0.2,max(x)+0.2, 0.1))
plt.yticks(np.arange(min(y)-0.2,max(y)+0.2, 0.1))
plt.show()
def print_error_contributions(dl, eigvals):
c = len(dl[0])
n = len(dl)
x = [i for i in range(n)]
y_prev_min = [0 for k in range(n)]
y_prev_max = [0 for k in range(n)]
colors = ['red', 'blue', 'black']
fig, vax = plt.subplots(1, 1, figsize=(24, 12))
for comp in range(c):
y_min = y_prev_max
y_max = [np.absolute(dl[k][comp])[0]+y_min[k] for k in range(n)]
vax.vlines(x, y_min, y_max, color=colors[comp], label=np.absolute(eigvals[comp]))
y_prev_min = y_min
y_prev_max = y_max
plt.legend()
plt.show()
tol = 1e-13
A = np.array([[4, 2, 3], [3, -5, 2], [-2, 3, 8]])
b = np.array([[8], [-14], [27]])
x_solution = np.array([[-1], [3], [2]])
x_0 = np.zeros((3, 1))
M = np.diag(np.diag(A))
N = M - A
P = LA.inv(M).dot(N)
w, v = LA.eig(P)
D = np.diag(w)
print("Diagonal")
print(D)
print("---")
Q = np.array(v)
print("Eigenbasis")
print(Q)
print("---")
Qinv = LA.inv(Q)
e, dl, xl, wl = analyze(A, b, x_solution)
wl.real[abs(wl.real) < tol] = 0.0
wl.imag[abs(wl.imag) < tol] = 0.0
w_lastiter = wl[-1]
x_lastiter = Q.dot(w_lastiter)
print("value of x after last iteration")
print(x_lastiter)
print(xl[-1])
plot_points([item[0][0] for item in wl])
wl_2 = [item[1][0] for item in wl]
plot_points(wl_2)
Notice the spiraling movement of w's to convergence due to presence of imaginary component
wl_3 = [item[2][0] for item in wl]
plot_points(wl_3)
print_error_contributions(dl, w)
print([np.absolute(item[0][0]) for item in dl])
print([np.absolute(item[1][0]) for item in dl])
print([np.absolute(item[2][0]) for item in dl])
# A is diagonally dominant, so convergence is possible.
# Since there are no imaginary numbers, no spiraling but linear movement of w's
# https://www.maa.org/press/periodicals/loci/joma/iterative-methods-for-solving-iaxi-ibi-jacobis-method
A = np.array([[4, -1, -1], [-2, 6, 1], [-1, 1, 7]])
b = np.array([[3], [9], [-6]])
x_solution = np.array([[1], [2], [-1]])
e, dl, xl, wl = analyze(A, b, x_solution, 50)
M = np.diag(np.diag(A))
N = M - A
P = LA.inv(M).dot(N)
z = LA.inv(M).dot(b)
w, v = LA.eig(P)
D = np.diag(w)
print("Eigenvalues in a diagonal matrix")
print(D)
print("---")
Q = np.array(v)
print("Eigenbasis")
print(Q)
print("---")
Qinv = LA.inv(Q)
print("value of x after last iteration:")
print(xl[-1])
print("-------")
w_lastiter = wl[-1]
x_from_w = Q.dot(w_lastiter)
print("value of x after last iteration via w")
print(x_from_w)
plot_points([item[0][0] for item in wl])
plot_points([item[1][0] for item in wl])
plot_points([item[2][0] for item in wl])
print_error_contributions(dl, w)
print([np.absolute(item[0][0]) for item in dl])
print([np.absolute(item[1][0]) for item in dl])
print([np.absolute(item[2][0]) for item in dl])
References
I referred to following online resources when I was trying to understand as well as to validate my thinking and used some of the example data from these sites to verify the formulas and the code.
- https://www.cis.upenn.edu/~cis515/cis515-12-sl5.pdf
- http://www.robots.ox.ac.uk/~sjrob/Teaching/EngComp/linAlg34.pdf (see page 68 for diagonal dominance)
- http://www.robots.ox.ac.uk/~sjrob/Teaching/EngComp/linAlg12.pdf
- https://math.unice.fr/~frapetti/CorsoF/cours2.pdf
- https://math.unice.fr/~frapetti/CorsoF/cours3.pdf
- Diagonal dominance: https://mathworld.wolfram.com/DiagonallyDominantMatrix.html
- https://math.la.asu.edu/~gardner/iter.pdf
- https://iuuk.mff.cuni.cz/~rakdver/linalg/lesson15-8.pdf
- https://www-users.cs.umn.edu/~saad/PS/iter2.pdf
- https://www-users.math.umn.edu/~olver/num_/lni.pdf
- https://math.stackexchange.com/questions/2812573/proving-the-jacobi-method-converges-for-diagonally-column-dominant-matrices/2813489
- Online tool to diagonalize: https://www.dcode.fr/matrix-diagonalization (you can copy / paste as mathjax! just need to replace '\&' with '&' sign)
$$ \begin{array}{l} \text{M}\to \left( \begin{array}{ccc} 0 && -0.5 && -0.75 \\ 0.6 && 0 && 0.4 \\ 0.25 && -0.375 && 0 \\ \end{array} \right) \\ M=P.D.P^{-1} \\ \text{D}\to \left( \begin{array}{ccc} -0.0887509+0.813099 i && 0.\, +0. i && 0.\, +0. i \\ 0.\, +0. i && -0.0887509-0.813099 i && 0.\, +0. i \\ 0.\, +0. i && 0.\, +0. i && 0.177502\, +0. i \\ \end{array} \right) \\ \text{P}\to \left( \begin{array}{ccc} 0.666668\, +0. i && 0.666668\, +0. i && -0.551765+0. i \\ -0.228481-0.58075 i && -0.228481+0.58075 i && -0.627745+0. i \\ 0.231211\, -0.33559 i && 0.231211\, +0.33559 i && 0.549082\, +0. i \\ \end{array} \right) \\ \end{array} $$