In this post we show how an integral can be computed using Monte-Carlo method and how combination with bootstrapping can increase precision.
Reference
Lets say we wan to integrate the cardinal sinus over the interval . Graphically it is about finding the area under the curve by taking care to respect sign of positive (above x axis) and negative area (below x axis).
The area of this curve can be numerically computed by trapezoid or quadrature, or analytically.
def model(x):
return np.sinc(x)
xlin = np.linspace(-4., 4., 20000)
ylin = model(xlin)
Itrap = integrate.trapezoid(ylin, x=xlin)
# 0.9499393464346436
Iquad = integrate.quad(model, xlin.min(), xlin.max())
# (0.9499393397673102, 9.485684460130983e-11)
Both method agrees with the 6 first significant figures and the quadrature value agrees up to its precision to the approximation given by Wolfram Alpha:
Method
The Monte-Carlo method works as follow:
- Uniformly randomly sample points on the curve extent ;
- Determine whether the point lies inside the area of the function and in which part (positive or negative);
- Compute the percentage of points lying in those areas and multiply by the area of the curve extent;
- Compute the integral estimation by subtracting the negative area from the positive area;
- Repeat the process time to bootstrap the procedure and let the CLT come into action;
The function below implements such a procedure except bootstrapping:
def monte_carlo(x, y, model, N=1000, seed=12345):
# Uniformly sample over [0;1]x[0;1]
np.random.seed(seed)
u = np.random.uniform(size=N)
v = np.random.uniform(size=N)
# Function extent:
xsize = x.max() - x.min()
ysize = y.max() - y.min()
# Transform sample to span the extent:
xmc = u * xsize + x.min()
ymc = v * ysize + y.min()
# Compute reference function for x sampled values:
yref = model(xmc)
# Identify negative points:
qyneg = ymc < 0.
# Identify postive area points:
qIpos = ymc <= yref
qIpos[qyneg] = False
Ipos = np.sum(qIpos)
# Identify negative area points:
qIneg = ymc >= yref
qIneg[~qyneg] = False
Ineg = np.sum(qIneg)
# Identify outside area points:
qInot = ~(qIpos | qIneg)
# Estimate integral:
A = xsize * ysize
I = (Ipos - Ineg) / N * A
return xmc, ymc, qInot, qIpos, qIneg, I
A typical output of such Monte-Carlo trial is shown below:
A single trial gives a poor estimate of the integral, hence the need of bootstrapping:
*_, I = monte_carlo(xlin, ylin, model, N=300_000, seed=1234567890)
# 0.9454496520667456
Now we will repeat the procedure for different sample size and for each population size we will bootstrap to enhance the estimation:
Nb = 3000
np.random.seed(12345)
estimates = []
for s in np.random.randint(0, 999999999, size=Nb):
for n in np.logspace(3, 6, 25, base=10):
n = int(n)
xmc, ymc, qInot, qIpos, qIneg, I = generate(xlin, ylin, model, N=n, seed=s)
estimates.append({"s": s, "n": n, "I": I})
data = pd.DataFrame(estimates)
After few minutes the procedure finishes and we can aggregate bootstrap samples to see how it helps to converge towards the expected integral value:
summary = data.groupby(["n"])["I"].quantile([0.1, 0.25, 0.5, 0.75, 0.9]).unstack()
summary = summary.merge(data.groupby(["n"])["I"].mean().rename("mean"), left_index=True, right_index=True)
Which graphically leads to:
This figure confirms bootstrapping is necessary to make the Monte-Carlo process to converge towards the expected values. Percentiles give us an estimate on the range where a single Monte-Carlo trial can lie. Another observation is the bootstrap distribution seems relatively symmetrical.
Figures below illustrates this affirmation:
So in this case we show that it is possible to estimate the value of the integral using the Bootstrapped Monte-Carlo method up to 4 significant figures with an error reasonably symmetrical.