### Numerical Methods: Estimating Integrals and the Trapezoidal Rule

The magnitude of an integral in 2D can be geometrically interpreted as the area bounded by y=f(x) and the x-axis between an interval. The positivity/negativity of the resulting integral is determined by how much area is over/under the x-axis such that any area over the x-axis adds to the value of the integral, and vice-versa. Thus, using basic geometry we are able to estimate/determine the value of an integral using common area formulas we already know, even integrals whose answer cannot be represented by elementary functions.

Many methods to estimate the area under a function evaluates the integral of simpler sub-curves that resembles the original function. Essentially, the curve is broken up into different sub-intervals and a simpler curve that estimates the original curve is created for each sub-interval. The curves made are generally a polynomial since polynomials are easy to integrate. Then, the integral of sub-curve in each sub-interval is evaluated and summed together. The result is an approximation of the integral of the entire curve.

Some notable and widely used methods include the Riemann sum which uses horizontal lines to estimate the integral, trapezoidal rule which use lines, and Simpson's 1/3 and 3/8 rule which uses degree two and three polynomials, respectively. It is important to note that estimating the integral of the overall curve then boils down to the integration of the sub-curves. If the area bounded by the sub-curve in each sub-interval and the x-axis is a simple shape (i.e. rectangles for the Riemann sum or polygons for the trapezoidal rule) then the integral of each sub-curve can be geometrically represented by such area. Figure 1: The area under the curve in black is approximated using the (left) Riemann sum (in red), trapezoidal rule (in yellow) and the Simpson's 1/3 rule (in green). Note the number of sub-intervals in this case is two, and each different method uses different polymeric curves to resemble the shape of the original curve.
In figure 1 it is evident that the maximum error of the estimate decreases as we increase the degree of the polynomial used to estimate the curve. It is also the case that the larger the amount of sub-intervals, the smaller the maximum error. These properties are generally true for any function.

## The Trapezoidal Rule

The trapezoidal rule is one of the easiest methods to estimate the value of an integral. It uses lines to approximate the shape of a curve. Then, the integration of all such lines results in an estimation. The trapezoidal rule is also simple to use since the equations are simple and the number of sub-intervals chosen do not have additional restrictions other than being positive (e.g. Simpson's 1/3 rule requires the number of sub-intervals to be even). Figure 2: Graphical explanation of the trapezoidal rule: two sub-intervals $[x_0,x_1]$ and $[x_1,x_2]$ are created as $n=2$. A line is drawn from $f(x_0)$ to $f(x_1)$ and from $f(x_1)$ to $f(x_2)$. The sum of the integral of the lines (shaded in yellow) gives the estimated value of the function's integral.

### Derivation

We briefly discuss a geometric derivation of the trapezoidal rule. Suppose we have an integral that spans from $x_0=a$ to $x_n=b$. We assume the function is continuous over the interval. We can partition such interval $[a,b]$ into $n$ smaller sub-intervals such that $a = x_0 \lt x_1 \lt \dots \lt x_i \lt \dots \lt x_{n-1} \lt x_n=b$. Define $\Delta x_i$ to be $x_i - x_{i-1}$. Consider a sub-interval $[x_{i-1},x_i]$. The curve in the sub-interval is estimated to be a linear line between the bounds of the sub-interval evaluated at the function. If $f(x_i)$ and $f(x_{i-1}) \geq 0$ then the integral of the linear line is given by the area bounded by the line and x-axis in the interval. Thus, the integral and therefore the area in the subinterval is determined to be $$I_i \approx \frac{f(x_i)+f(x_{i-1})}{2}\Delta x_i \label{eq:1} \tag{1}$$
For $f(x_i)$ and $f(x_{i-1}) \leq 0$ the integral is estimated using the same method as above, and is the negative of the area. Thus it is $-\frac{|f(x_i)+f(x_{i-1})|}{2}\Delta x_i$. This equals to equation \ref{eq:1} since $f(x_i)$ and $f(x_{i-1}) \leq 0$. If $f(x_i)$ and $f(x_{i-1})$ have opposing signs, we can further break up the sub-interval. There only exists one point $x_k$ where $f(x_k)=0$ due to the intermediate value theorem and the fact that a line can only pass the x-axis once, never or an infinite amount of times. The estimated integral of the function in $[x_{i-1},x_i]$ is determined to be:
\begin{align*}I_i &\approx \frac{f(x_k)+f(x_{i-1})}{2}(x_k-x_{i-1})+\frac{f(x_i)+f(x_k)}{2}(x_i-x_k)\\
&=\frac{f(x_{i-1})}{2}(x_k-x_{i-1})+\frac{f(x_i)}{2}(x_i-x_k)\\\end{align*}By similar triangles we see that $x_k-x_{i-1}=-\frac{f(x_{i-1})\Delta x_i}{f(x_i)-f(x_{i-1})}$ and $x_i-x_k=\frac{f(x_i)\Delta x_i}{f(x_i)-f(x_{i-1})}$ so
\begin{align*}I_i &\approx -\frac{f^2(x_{i-1})}{2(f(x_i)-f(x_{i-1}))}\Delta x_i+\frac{f^2(x_i)}{2(f(x_i)-f(x_{i-1}))}\Delta x_i\\
&=\frac{f^2(x_i)-f^2(x_{i-1})}{2(f(x_i)-f(x_{i-1}))}\Delta x_i\\
&=\frac{f(x_i)+f(x_{i-1})}{2}\Delta x_i\\\end{align*}
Therefore regardless of the nature of the function in any sub-interval, the integral in $[x_{i-1},x_i]$ is estimated to be equation \ref{eq:1}.
The integral of the overall curve in the interval $[a,b]$ is the sum of all the estimated integrals in each sub-interval, explicitly:
$$I = \sum_{i=1}^{n} I_i$$
$$I \approx \sum_{i=1}^{n} \frac{f(x_{i-1})+f(x_{i})}{2}\Delta x_i \label{eq:2} \tag{2}$$

If the sub-intervals are evenly spaced, then
$$I \approx \frac{b-a}{2n}\sum_{i=1}^{n} \big(f(x_{i-1})+f(x_i)\big)$$
$$I \approx \frac{b-a}{2n} \bigg(f(x_0)+f(x_n)+2\sum_{i=1}^{n-1} f(x_{i})\bigg) \label{eq:3} \tag{3}$$

Equations \ref{eq:2} and \ref{eq:3} are the final forms used in this numerical method.

### Implementation

Implementation of the trapezoidal rule is relatively straightforward. One thing to note is that the higher the number of sub-intervals, the better the accuracy however it is important to realize that the higher the number, the more prominent round-off errors get. Generally the round-off errors associated to this method are negligible assuming the number of intervals don't get ridiculously large. It is recommended to calculate and use the minimum number of sub-intervals to achieve a desired maximum error using this formula. Then as the integration bounds change, it is generally recommended to triple the number of sub-intervals for the doubling of the integration interval; this will keep the maximum error relatively the same assuming the maximum concavity of the function in the new interval change stays constant compared to the old one.
function f(x)
return math.sin(x)/x
end

a=1 --lower bound
b=3 --upper bound
n=5000 --number of sub-intervals

sum=0
for i=1,n do
xi=((b-a)*i/n)+a
xi1=((b-a)*(i-1)/n)+a
sum=sum+(f(xi)+f(xi1))*((b-a)/(2*n))
end

print(sum)

Figure 3: Lua code for the implementation of the trapezoidal rule. Note the estimation of a non-elementary integral sin(x)/x.

The method above is adequate but relies on the user to implement a suitable subinterval count. A better method of implementation involves estimating the integral with progressively more subintervals until the answer converges to a single value (i.e. is numerically stable). This is done by calculating the estimated integral while increasing the amount of subintervals used, calculating the relative difference between the current estimate with the previous best estimate as we go. Only when this error is below a preset tolerance will the answer be outputted. In the example below, this method is implemented in MATLAB but is improved with Romberg's method - a way to decrease the number of iterations used with the trapezoidal method.
rombInt(@(x) sin(x)./x,1,3)

function I=rombInt(f,a,b)
maxIter=1000; % Max iterations
etol = 1e-6; % Tolerance for relative error
n=1;
if a>=b
error("Must provide unequal and smaller a value than b")
end
prev=Inf;
for j=1:maxIter
% Apply trapezoidal rule twice for Romberg's method
trap1=0;
xDif=(b-a)/n;
x=a:xDif:b;
y=f(x);
for i=1:length(x)-1
trap1=trap1+0.5*(y(i)+y(i+1))*xDif;
end
n=n*2; % Double the trapezoidal rule subinterval count
trap2=0;
xDif=(b-a)/n;
x=a:xDif:b;
y=f(x);
for i=1:length(x)-1
trap2=trap2+0.5*(y(i)+y(i+1))*xDif;
end
n=n*2;
% Calculate final integral
I=(4/3)*trap2-(1/3)*trap1;
if abs((prev-I)/I) <= etol
fprintf("Convergence in %.d iterations",j)
break
elseif j==maxIter
warning("Integral did not converge")
break
end
prev=I;
end
end

Figure 4: MATLAB code for the implementation of an adaptive trapezoidal method.

## Exercises

1. Using the error formula for the trapezoidal rule what is the equation that gives the maximum possible error for trapezoidal estimation of $\int^{x}_{0} t^3+e^{t}~dt$ using $n$ equal sub-intervals? What is the formula for the exact error?
2. Prove that the trapezoidal rule, for any number of sub-intervals greater or equal to 1,
1. overestimates a parabola's integral if it faces upward.
2. perfectly estimates the integral for functions of the form $y=ax+b$.
3. Use the trapezoidal rule with $n=4$ to estimate $\int_{0}^{5}5x-5~dx$.
4. Estimate the integral from $\pi$ to $2\pi$ of $\frac{\sin(x)}{x}$ with the minimum number of sub-intervals such that the maximum error is no more than 0.05.
5. For batch/simple distillation where a binary liquid mixture is boiled and condensed into a receiving flask, the equation $\ln\Big(\frac{W}{F}\Big)=-\int_{x_w}^{x_f}\frac{dx}{y-x}$ is important to quantify the ending amounts of liquid in both containers and its respective concentrations. $x$ is related to $y$ as provided in the table below:
6. Figure 5: Table of values for question 5.
If $x_w=0.02, x_f=0.20$, and $F=1$, estimate $W$, the remaining liquid in the feed pot in mols. 
7. Derive equation \ref{eq:1} using
1. a geometric approach.
2. a mathematical approach.
8. Using one sub-interval with the trapezoidal rule to estimate certain functions is sometimes more accurate than using two or more intervals. Provide a function that shows this.
9. Refer to Figure 1 and identify the numerical method that estimates the integral
1. the most poorly?
2. the most accurately?
3. the quickest, if calculated manually?
4. the slowest, if calculated manually?
10. Refer to equation \ref{eq:3} and the equation directly above it. Why is it more efficient to use equation \ref{eq:3} rather than the other one, especially when manually evaluating the formula?
11. We've seen different methods that can be used to estimate the integral of a curve. The ones mentioned use polynomials of varying degrees to resemble the shape of the original curve. Referring to Figure 5 and considering all the methods presented in this article, which method would be the best (efficiency and accuracy wise) at estimating the integral of the functions from $x=1$ to $x=5$? For each function,
1. is equation \ref{eq:2} or \ref{eq:3} more useful?
2. how many sub-intervals would you choose?
3. where would you partition the sub-intervals? Figure 6: Question 10 graphs - which method would be the best (efficiency and accuracy wise) at estimating the integral of the following functions?  Question adapted from Prof. R. Pal from the University of Waterloo, Ontario. Images belong to KrIsMa. HTML code format generated using hilite.me.