Calculus: Differentiation and Its Application

A statistical perspective

Dr. Haziq Jamil

Assistant Professor in Statistics, Universiti Brunei Darussalam

https://haziqj.ml/uitm-calculus/

June 14, 2025

(Almost) Everything you ought to know…

…about calculus in the first year

Let \(f:\mathcal X \to \mathbb R\) be a real-valued function defined on an input set \(\mathcal X\).

Definition 1 (Differentiability) \(f(x)\) is said to be differentiable at a point \(x \in \mathcal X\) if the limit

\[ L = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \tag{1}\] exists. If \(L\) exists, we denote it by \(f'(x)\) or \(\frac{df}{dx}(x)\), and call it the derivative of \(f\) at \(x\). Further, \(f\) is said to be differentiable on \(\mathcal X\) if it is differentiable at every point in \(\mathcal X\).

For now, we assume \(\mathcal X \subseteq \mathbb R\), and will extend to higher dimensions later.

Some examples


Function Derivative
\(f(x) = x^2\) \(f'(x) = 2x\)
\(f(x) = \sum_{n} a_n x^n\) \(f'(x) = \sum_{n} n a_n x^{n-1}\)
\(f(x) = \sin(x)\) \(f'(x) = \cos(x)\)
\(f(x) = \cos(x)\) \(f'(x) = -\sin(x)\)
\(f(x) = e^x\) \(f'(x) = e^x\)
\(f(x) = \ln(x)\) \(f'(x) = \frac{1}{x}\)


We can derive it “by hand” using the definition. Let \(f(x) = x^2\). Then,



\[ \begin{align} \lim_{h \to 0} & \frac{f(x + h) - f(x)}{h} \\ &= \lim_{h \to 0} \frac{(x + h)^2 - x^2}{h} \\ &= \lim_{h \to 0} \frac{x^2 + 2xh + h^2 - x^2}{h} \\[0.5em] &= \lim_{h \to 0} 2x + h \\[0.5em] &= 2x. \end{align} \]

Graphically…

What is a derivative?

The derivative of a function tells you:

  • 🚀 How fast the function is changing at any point
  • 📐 The slope of the tangent line at that point

The concept of optimisation

  • When \(f\) is some kind of a “reward” function, then the value of \(x\) that maximises \(f\) is highly of interest. Some examples:
    • 💰 Profit maximisation: Find the price that maximises profit.
    • 🧬 Biological processes: Find the conditions that maximise growth or reproduction rates.
    • 👷‍♂️ Engineering: Find the design parameters that maximise strength or efficiency.
  • Derivatives help us find so-called critical values: Solve \(f'(x) = 0\).

Example 1 Find the maximum of \(f(x) = -3x^4 + 4x^3 + 12x^2\).

\[ \begin{align*} f'(x) = -12x^3 + 12x^2 + 24x &= 0 \\ \Leftrightarrow 12x(2 + x - x^2) &= 0 \\ \Leftrightarrow 12x(x+1)(x-2) &= 0 \\ \Leftrightarrow x &= 0, -1, 2. \end{align*} \]

Are all of these critical values maxima values? 🤔

Graphically…

How do we know if it’s a maxima or minima?

Second derivative test: Measure the change in slope around the critical point \(\hat x\), i.e. \(f''(\hat x) = \frac{d}{dx}\left( \frac{df}{dx}(x) \right) = \frac{d^2f}{dx^2}(x)\).

Behaviour of \(f\) near \(\hat x\) \(f''(\hat x)\) Shape Conclusion
Increasing → Decreasing \(f''(\hat x) < 0\) Concave (∩) Local maximum
Decreasing → Increasing \(f''(\hat x) > 0\) Convex (∪) Local minimum
No sign change / flat region \(f''(\hat x) = 0\) Unknown / flat Inconclusive

Curvature

Definition 2 Let \(\mathcal C_x\) denote the osculating circle at \(x\) with centre \(c\) and radius \(r\), i.e. the circle that best approximates the graph of \(f\) at \(x\). The curvature \(\kappa\) for a graph of a function \(f\) at a point \(x\) is defined as \(\kappa = \frac{1}{r}\).

Curvature and concavity

Definition 3 (Curvature) The (signed) curvature for a graph \(y=f(x)\) is \[ \kappa = \frac{f''(x)}{\big(1 + [f'(x)]^2\big)^{3/2}}. \]

  • The second derivative \(f''(x)\) tells us how fast the slope is changing.

  • The sign of the curvature is the same as the sign of \(f''(x)\). Hence,

    • If \(f''(x) > 0\), the graph is concave up (convex).
    • If \(f''(x) < 0\), the graph is concave down (concave).
  • The magnitude of the curvature is proportional to \(f''(x)\). Hence,

    • If \(|f''(x)|\) is large, the graph is steep and “curvier”.
    • If \(|f''(x)|\) is small, the graph is flat and “gentle”.
  • For reference, a straight line has zero curvature.

Summary so far

  • Derivatives represent rate of change (slope) of a function \(f:\mathcal X \to \mathbb R\).

  • Interested in optimising an objective function \(f(x)\) representing some kind of “reward” or “cost”.

  • Find critical points by solving \(f'(x) = 0\).

  • Use the second derivative test to classify critical points:

    • If \(f''(x) < 0\), then \(f\) is concave down at \(x\) and \(x\) is a local maximum.
    • If \(f''(x) > 0\), then \(f\) is concave up at \(x\) and \(x\) is a local minimum.
    • If \(f''(x) = 0\), then the test is inconclusive.
  • Curvature tells us how steep the curve is at its optima. In some sense, it tells us how hard or easy it is to find the optimum.

A statistical persepctive

Motivation

Show students how theoretical tools they’re learning now are used to derive estimators;

Plant the idea that Statistics isn’t just “data” or “Excel”, but has real mathematical depth; Give a clear reason to care about derivatives, gradients, Hessians, and Jacobians in practice.

Functions

Derivatives

Derivatives

Goal: Shift mindset from high-school calculus to tools for optimization and multidimensional problems. • Quick recap: derivative as slope, second derivative as curvature • Extension to multivariable functions: • Partial derivatives • Gradient as direction of steepest ascent • Hessian as local curvature matrix • Applications in optimization: what it means to maximize or minimize a function with multiple variables

Example: A simple multivariate function \(f(x, y) = -x^2 - y^2 + 4x + 6y\). Find its critical point using partial derivatives, use the Hessian to classify.

Maximum likelihood estimation

  • Define likelihood: \(L(\theta) = \prod_{i=1}^n f(x_i \mid \theta)\)
    • Log-likelihood: easier to differentiate
    • MLE found by solving \(\frac{d}{d\theta} \ell(\theta) = 0\)

Examples

Emphasize: the maximum is found because second derivative is negative (concavity)

Fisher information

Second derivative = Fisher Information (intuition)

  • Negative expected second derivative of log-likelihood
  • Measures how peaked the likelihood is → how much information data carries about the parameter

Jacobians and change of variables

Show why Jacobians matter when transforming distributions. • Simple change of variable in one dimension • If \(Y = g(X)\), then \(f_Y(y) = f_X(x) \left| \frac{dx}{dy} \right|\) • Multivariate case: use Jacobian determinant • Example: transforming from Cartesian to polar coordinates • Example: bivariate normal to standard normal → Cholesky or whitening

Application: Why this matters in simulations and Bayesian inference (brief mention of Metropolis-Hastings, normalizing flows, etc.)

Conclusions

  • Calculus is not just background math—it’s the engine of statistical theory.
  • Beyond undergraduate stats: everything from MLE, Bayesian posteriors, to machine learning involves gradients and Hessians. Suggest: for students who love math but unsure about statistics—you are the kind of person statistics needs.