Many methods to estimate the area under a function evaluates the integral of simpler sub-curves that resembles the original function. Essentially, the curve is broken up into different sub-intervals and a simpler curve that estimates the original curve is created for each sub-interval. The curves made are generally a polynomial since polynomials are easy to integrate. Then, the integral of sub-curve in each sub-interval is evaluated and summed together. The result is an approximation of the integral of the entire curve.

Some notable and widely used methods include the Riemann sum which uses horizontal lines to estimate the integral, trapezoidal rule which use lines, and Simpson's 1/3 and 3/8 rule which uses degree two and three polynomials, respectively. It is important to note that estimating the integral of the overall curve then boils down to the integration of the sub-curves. If the area bounded by the sub-curve in each sub-interval and the x-axis is a simple shape (i.e. rectangles for the Riemann sum or polygons for the trapezoidal rule) then the integral of each sub-curve can be geometrically represented by such area.

## The Trapezoidal Rule

The trapezoidal rule is one of the easiest methods to estimate the value of an integral. It uses lines to approximate the shape of a curve. Then, the integration of all such lines results in an estimation. The trapezoidal rule is also simple to use since the equations are simple and the number of sub-intervals chosen do not have additional restrictions other than being positive (e.g. Simpson's 1/3 rule requires the number of sub-intervals to be even).### Derivation

We briefly discuss a geometric derivation of the trapezoidal rule. Suppose we have an integral that spans from $x_0=a$ to $x_n=b$. We assume the function is continuous over the interval. We can partition such interval $[a,b]$ into $n$ smaller sub-intervals such that $a = x_0 \lt x_1 \lt \dots \lt x_i \lt \dots \lt x_{n-1} \lt x_n=b$. Define $\Delta x_i$ to be $x_i - x_{i-1}$. Consider a sub-interval $[x_{i-1},x_i]$. The curve in the sub-interval is estimated to be a linear line between the bounds of the sub-interval evaluated at the function. If $f(x_i)$ and $f(x_{i-1}) \geq 0$ then the integral of the linear line is given by the area bounded by the line and x-axis in the interval. Thus, the integral and therefore the area in the subinterval is determined to be $$I_i \approx \frac{f(x_i)+f(x_{i-1})}{2}\Delta x_i \label{eq:1} \tag{1}$$For $f(x_i)$ and $f(x_{i-1}) \leq 0$ the integral is estimated using the same method as above, and is the negative of the area. Thus it is $-\frac{|f(x_i)+f(x_{i-1})|}{2}\Delta x_i$. This equals to equation \ref{eq:1} since $f(x_i)$ and $f(x_{i-1}) \leq 0$. If $f(x_i)$ and $f(x_{i-1})$ have opposing signs, we can further break up the sub-interval. There only exists one point $x_k$ where $f(x_k)=0$ due to the intermediate value theorem and the fact that a line can only pass the x-axis once, never or an infinite amount of times. The estimated integral of the function in $[x_{i-1},x_i]$ is determined to be:

\begin{align*}I_i &\approx \frac{f(x_k)+f(x_{i-1})}{2}(x_k-x_{i-1})+\frac{f(x_i)+f(x_k)}{2}(x_i-x_k)\\

&=\frac{f(x_{i-1})}{2}(x_k-x_{i-1})+\frac{f(x_i)}{2}(x_i-x_k)\\\end{align*}By similar triangles we see that $x_k-x_{i-1}=-\frac{f(x_{i-1})\Delta x_i}{f(x_i)-f(x_{i-1})}$ and $x_i-x_k=\frac{f(x_i)\Delta x_i}{f(x_i)-f(x_{i-1})}$ so

\begin{align*}I_i &\approx -\frac{f^2(x_{i-1})}{2(f(x_i)-f(x_{i-1}))}\Delta x_i+\frac{f^2(x_i)}{2(f(x_i)-f(x_{i-1}))}\Delta x_i\\

&=\frac{f^2(x_i)-f^2(x_{i-1})}{2(f(x_i)-f(x_{i-1}))}\Delta x_i\\

&=\frac{f(x_i)+f(x_{i-1})}{2}\Delta x_i\\\end{align*}

Therefore regardless of the nature of the function in any sub-interval, the integral in $[x_{i-1},x_i]$ is estimated to be equation \ref{eq:1}.

The integral of the overall curve in the interval $[a,b]$ is the sum of all the estimated integrals in each sub-interval, explicitly:

$$I = \sum_{i=1}^{n} I_i$$

$$I \approx \sum_{i=1}^{n} \frac{f(x_{i-1})+f(x_{i})}{2}\Delta x_i \label{eq:2} \tag{2}$$

If the sub-intervals are evenly spaced, then

$$I \approx \frac{b-a}{2n}\sum_{i=1}^{n} \big(f(x_{i-1})+f(x_i)\big)$$

$$I \approx \frac{b-a}{2n} \bigg(f(x_0)+f(x_n)+2\sum_{i=1}^{n-1} f(x_{i})\bigg) \label{eq:3} \tag{3}$$

Equations \ref{eq:2} and \ref{eq:3} are the final forms used in this numerical method.

### Implementation

Implementation of the trapezoidal rule is relatively straightforward. One thing to note is that the higher the number of sub-intervals, the better the accuracy

*however*it is important to realize that the higher the number, the more prominent round-off errors get. Generally the round-off errors associated to this method are negligible assuming the number of intervals don't get ridiculously large. It is recommended to calculate and use the minimum number of sub-intervals to achieve a desired maximum error using this formula. Then as the integration bounds change, it is generally recommended to triple the number of sub-intervals for the doubling of the integration interval; this will keep the maximum error relatively the same assuming the maximum concavity of the function in the new interval change stays constant compared to the old one.function f(x) return math.sin(x)/x end a=1 --lower bound b=3 --upper bound n=5000 --number of sub-intervals sum=0 for i=1,n do xi=((b-a)*i/n)+a xi1=((b-a)*(i-1)/n)+a sum=sum+(f(xi)+f(xi1))*((b-a)/(2*n)) end print(sum)

Figure 3: Lua code for the implementation of the trapezoidal rule. Note the estimation of a non-elementary integral sin(x)/x.

The method above is adequate but relies on the user to implement a suitable subinterval count. A better method of implementation involves estimating the integral with progressively more subintervals until the answer converges to a single value (i.e. is numerically stable). This is done by calculating the estimated integral while increasing the amount of subintervals used, calculating the relative difference between the current estimate with the previous best estimate as we go. Only when this error is below a preset tolerance will the answer be outputted. In the example below, this method is implemented in MATLAB but is improved with Romberg's method - a way to decrease the number of iterations used with the trapezoidal method.

rombInt(@(x) sin(x)./x,1,3) function I=rombInt(f,a,b) maxIter=1000; % Max iterations etol = 1e-6; % Tolerance for relative error n=1; if a>=b error("Must provide unequal and smaller a value than b") end prev=Inf; for j=1:maxIter % Apply trapezoidal rule twice for Romberg's method trap1=0; xDif=(b-a)/n; x=a:xDif:b; y=f(x); for i=1:length(x)-1 trap1=trap1+0.5*(y(i)+y(i+1))*xDif; end n=n*2; % Double the trapezoidal rule subinterval count trap2=0; xDif=(b-a)/n; x=a:xDif:b; y=f(x); for i=1:length(x)-1 trap2=trap2+0.5*(y(i)+y(i+1))*xDif; end n=n*2; % Calculate final integral I=(4/3)*trap2-(1/3)*trap1; if abs((prev-I)/I) <= etol fprintf("Convergence in %.d iterations",j) break elseif j==maxIter warning("Integral did not converge") break end prev=I; end end

Figure 4: MATLAB code for the implementation of an adaptive trapezoidal method.

## Exercises

- Using the error formula for the trapezoidal rule what is the equation that gives the
*maximum*possible error for trapezoidal estimation of $\int^{x}_{0} t^3+e^{t}~dt$ using $n$ equal sub-intervals? What is the formula for the*exact*error? - Prove that the trapezoidal rule, for any number of sub-intervals greater or equal to 1,
- overestimates a parabola's integral if it faces upward.
- perfectly estimates the integral for functions of the form $y=ax+b$.
- Use the trapezoidal rule with $n=4$ to estimate $\int_{0}^{5}5x-5~dx$.
- Estimate the integral from $\pi$ to $2\pi$ of $\frac{\sin(x)}{x}$ with the minimum number of sub-intervals such that the maximum error is no more than 0.05.
- For batch/simple distillation where a binary liquid mixture is boiled and condensed into a receiving flask, the equation $\ln\Big(\frac{W}{F}\Big)=-\int_{x_w}^{x_f}\frac{dx}{y-x}$
_{ }is important to quantify the ending amounts of liquid in both containers and its respective concentrations. $x$ is related to $y$ as provided in the table below: - Derive equation \ref{eq:1} using
- a geometric approach.
- a mathematical approach.
- Using one sub-interval with the trapezoidal rule to estimate certain functions is sometimes more accurate than using two or more intervals. Provide a function that shows this.
- Refer to Figure 1 and identify the numerical method that estimates the integral
- the most poorly?
- the most accurately?
- the quickest, if calculated manually?
- the slowest, if calculated manually?
- Refer to equation \ref{eq:3} and the equation directly above it. Why is it more efficient to use equation \ref{eq:3} rather than the other one, especially when manually evaluating the formula?
- We've seen different methods that can be used to estimate the integral of a curve. The ones mentioned use polynomials of varying degrees to resemble the shape of the original curve. Referring to Figure 5 and considering all the methods presented in this article, which method would be the best (efficiency and accuracy wise) at estimating the integral of the functions from $x=1$ to $x=5$? For each function,
- is equation \ref{eq:2} or \ref{eq:3} more useful?
- how many sub-intervals would you choose?
- where would you partition the sub-intervals?

Figure 5: Table of values for question 5. |

If $x_w=0.02, x_f=0.20$, and $F=1$, estimate $W$, the remaining liquid in the feed pot in mols. [1]