A statistical perspective
Assistant Professor in Statistics, Universiti Brunei Darussalam
June 14, 2025
Let \(f:\mathcal X \to \mathbb R\) be a real-valued function defined on an input set \(\mathcal X\).
Definition 1 (Differentiability) \(f(x)\) is said to be differentiable at a point \(x \in \mathcal X\) if the limit
\[ L = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \tag{1}\] exists. If \(L\) exists, we denote it by \(f'(x)\) or \(\frac{df}{dx}(x)\), and call it the derivative of \(f\) at \(x\). Further, \(f\) is said to be differentiable on \(\mathcal X\) if it is differentiable at every point in \(\mathcal X\).
For now, we assume \(\mathcal X \subseteq \mathbb R\), and will extend to higher dimensions later.
Function | Derivative |
---|---|
\(f(x) = x^2\) | \(f'(x) = 2x\) |
\(f(x) = \sum_{n} a_n x^n\) | \(f'(x) = \sum_{n} n a_n x^{n-1}\) |
\(f(x) = \sin(x)\) | \(f'(x) = \cos(x)\) |
\(f(x) = \cos(x)\) | \(f'(x) = -\sin(x)\) |
\(f(x) = e^x\) | \(f'(x) = e^x\) |
\(f(x) = \ln(x)\) | \(f'(x) = \frac{1}{x}\) |
We can derive it “by hand” using the definition. Let \(f(x) = x^2\). Then,
\[ \begin{align} \lim_{h \to 0} & \frac{f(x + h) - f(x)}{h} \\ &= \lim_{h \to 0} \frac{(x + h)^2 - x^2}{h} \\ &= \lim_{h \to 0} \frac{x^2 + 2xh + h^2 - x^2}{h} \\[0.5em] &= \lim_{h \to 0} 2x + h \\[0.5em] &= 2x. \end{align} \]
The derivative of a function tells you:
Example 1 Find the maximum of \(f(x) = -3x^4 + 4x^3 + 12x^2\).
\[ \begin{align*} f'(x) = -12x^3 + 12x^2 + 24x &= 0 \\ \Leftrightarrow 12x(2 + x - x^2) &= 0 \\ \Leftrightarrow 12x(x+1)(x-2) &= 0 \\ \Leftrightarrow x &= 0, -1, 2. \end{align*} \]
Are all of these critical values maxima values? 🤔
Second derivative test: Measure the change in slope around the critical point \(\hat x\), i.e. \(f''(\hat x) = \frac{d}{dx}\left( \frac{df}{dx}(x) \right) = \frac{d^2f}{dx^2}(x)\).
Behaviour of \(f\) near \(\hat x\) | \(f''(\hat x)\) | Shape | Conclusion | |
---|---|---|---|---|
Increasing → Decreasing | \(f''(\hat x) < 0\) | Concave (∩) | Local maximum | |
Decreasing → Increasing | \(f''(\hat x) > 0\) | Convex (∪) | Local minimum | |
No sign change / flat region | \(f''(\hat x) = 0\) | Unknown / flat | Inconclusive |
Definition 2 Let \(\mathcal C_x\) denote the osculating circle at \(x\) with centre \(c\) and radius \(r\), i.e. the circle that best approximates the graph of \(f\) at \(x\). The curvature \(\kappa\) for a graph of a function \(f\) at a point \(x\) is defined as \(\kappa = \frac{1}{r}\).
Definition 3 (Curvature) The (signed) curvature for a graph \(y=f(x)\) is \[ \kappa = \frac{f''(x)}{\big(1 + [f'(x)]^2\big)^{3/2}}. \]
The second derivative \(f''(x)\) tells us how fast the slope is changing.
The sign of the curvature is the same as the sign of \(f''(x)\). Hence,
The magnitude of the curvature is proportional to \(f''(x)\). Hence,
For reference, a straight line has zero curvature.
Derivatives represent rate of change (slope) of a function \(f:\mathcal X \to \mathbb R\).
Interested in optimising an objective function \(f(x)\) representing some kind of “reward” or “cost”.
Find critical points by solving \(f'(x) = 0\).
Use the second derivative test to classify critical points:
Curvature tells us how steep the curve is at its optima. In some sense, it tells us how hard or easy it is to find the optimum.
Show students how theoretical tools they’re learning now are used to derive estimators;
Plant the idea that Statistics isn’t just “data” or “Excel”, but has real mathematical depth; Give a clear reason to care about derivatives, gradients, Hessians, and Jacobians in practice.
Goal: Shift mindset from high-school calculus to tools for optimization and multidimensional problems. • Quick recap: derivative as slope, second derivative as curvature • Extension to multivariable functions: • Partial derivatives • Gradient as direction of steepest ascent • Hessian as local curvature matrix • Applications in optimization: what it means to maximize or minimize a function with multiple variables
Example: A simple multivariate function \(f(x, y) = -x^2 - y^2 + 4x + 6y\). Find its critical point using partial derivatives, use the Hessian to classify.
Emphasize: the maximum is found because second derivative is negative (concavity)
Second derivative = Fisher Information (intuition)
Show why Jacobians matter when transforming distributions. • Simple change of variable in one dimension • If \(Y = g(X)\), then \(f_Y(y) = f_X(x) \left| \frac{dx}{dy} \right|\) • Multivariate case: use Jacobian determinant • Example: transforming from Cartesian to polar coordinates • Example: bivariate normal to standard normal → Cholesky or whitening
Application: Why this matters in simulations and Bayesian inference (brief mention of Metropolis-Hastings, normalizing flows, etc.)