A reference post testing every type of mathematical notation used in computer science and machine learning research, plus a couple of MDX components.
Basic Arithmetic & Algebra#
Inline math flows naturally in text. The quadratic formula is x = − b ± b 2 − 4 a c 2 a x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} x = 2 a − b ± b 2 − 4 a c and the Pythagorean theorem is a 2 + b 2 = c 2 a^2 + b^2 = c^2 a 2 + b 2 = c 2 .
Block equations get their own space:
( a + b ) 2 = a 2 + 2 a b + b 2 (a + b)^2 = a^2 + 2ab + b^2 ( a + b ) 2 = a 2 + 2 ab + b 2
∑ i = 1 n i = n ( n + 1 ) 2 \sum_{i=1}^{n} i = \frac{n(n+1)}{2} i = 1 ∑ n i = 2 n ( n + 1 )
Calculus#
The definition of a derivative:
f ′ ( x ) = lim h → 0 f ( x + h ) − f ( x ) h f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} f ′ ( x ) = h → 0 lim h f ( x + h ) − f ( x )
The chain rule:
d d x [ f ( g ( x ) ) ] = f ′ ( g ( x ) ) ⋅ g ′ ( x ) \frac{d}{dx}[f(g(x))] = f'(g(x)) \cdot g'(x) d x d [ f ( g ( x ))] = f ′ ( g ( x )) ⋅ g ′ ( x )
Definite integral:
∫ a b f ( x ) d x = F ( b ) − F ( a ) \int_a^b f(x)\, dx = F(b) - F(a) ∫ a b f ( x ) d x = F ( b ) − F ( a )
Taylor series expansion:
f ( x ) = ∑ n = 0 ∞ f ( n ) ( a ) n ! ( x − a ) n f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}(x-a)^n f ( x ) = n = 0 ∑ ∞ n ! f ( n ) ( a ) ( x − a ) n
Linear Algebra#
Matrix multiplication — for A ∈ R m × k A \in \mathbb{R}^{m \times k} A ∈ R m × k and B ∈ R k × n B \in \mathbb{R}^{k \times n} B ∈ R k × n :
( A B ) i j = ∑ l = 1 k A i l B l j (AB)_{ij} = \sum_{l=1}^{k} A_{il} B_{lj} ( A B ) ij = l = 1 ∑ k A i l B l j
Determinant of a 2×2 matrix:
det ( a b c d ) = a d − b c \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc det ( a c b d ) = a d − b c
Eigenvalue equation:
A v = λ v A\mathbf{v} = \lambda \mathbf{v} A v = λ v
L2 norm:
∥ x ∥ 2 = ∑ i = 1 n x i 2 \|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2} ∥ x ∥ 2 = i = 1 ∑ n x i 2
Singular Value Decomposition:
A = U Σ V T A = U \Sigma V^T A = U Σ V T
Probability & Statistics#
Bayes’ theorem:
P ( A ∣ B ) = P ( B ∣ A ) P ( A ) P ( B ) P(A \mid B) = \frac{P(B \mid A)\, P(A)}{P(B)} P ( A ∣ B ) = P ( B ) P ( B ∣ A ) P ( A )
Normal distribution (Gaussian):
f ( x ) = 1 σ 2 π exp ( − ( x − μ ) 2 2 σ 2 ) f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) f ( x ) = σ 2 π 1 exp ( − 2 σ 2 ( x − μ ) 2 )
KL Divergence:
D K L ( P ∥ Q ) = ∑ x P ( x ) log P ( x ) Q ( x ) D_{KL}(P \| Q) = \sum_{x} P(x) \log \frac{P(x)}{Q(x)} D K L ( P ∥ Q ) = x ∑ P ( x ) log Q ( x ) P ( x )
Machine Learning#
Loss Functions#
Mean Squared Error:
L MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 \mathcal{L}_{\text{MSE}} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 L MSE = n 1 i = 1 ∑ n ( y i − y ^ i ) 2
Binary Cross-Entropy:
L BCE = − 1 n ∑ i = 1 n [ y i log ( y ^ i ) + ( 1 − y i ) log ( 1 − y ^ i ) ] \mathcal{L}_{\text{BCE}} = -\frac{1}{n}\sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i)\log(1 - \hat{y}_i) \right] L BCE = − n 1 i = 1 ∑ n [ y i log ( y ^ i ) + ( 1 − y i ) log ( 1 − y ^ i ) ]
Activation Functions#
Sigmoid:
σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1 + e^{-x}} σ ( x ) = 1 + e − x 1
Softmax:
softmax ( x i ) = e x i ∑ j = 1 K e x j \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}} softmax ( x i ) = ∑ j = 1 K e x j e x i
ReLU (inline): ReLU ( x ) = max ( 0 , x ) \text{ReLU}(x) = \max(0, x) ReLU ( x ) = max ( 0 , x )
Attention ( Q , K , V ) = softmax ( Q K T d k ) V \text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right) V Attention ( Q , K , V ) = softmax ( d k Q K T ) V
Big-O Complexity#
Common complexities as inline math: O ( 1 ) O(1) O ( 1 ) , O ( log n ) O(\log n) O ( log n ) , O ( n ) O(n) O ( n ) , O ( n log n ) O(n \log n) O ( n log n ) , O ( n 2 ) O(n^2) O ( n 2 ) , O ( 2 n ) O(2^n) O ( 2 n ) , O ( n ! ) O(n!) O ( n !)
Master theorem for divide and conquer — if T ( n ) = a T ( n / b ) + f ( n ) T(n) = aT(n/b) + f(n) T ( n ) = a T ( n / b ) + f ( n ) :
T ( n ) = { O ( n log b a ) if f ( n ) = O ( n log b a − ε ) O ( n log b a log n ) if f ( n ) = O ( n log b a ) O ( f ( n ) ) if f ( n ) = Ω ( n log b a + ε ) T(n) = \begin{cases}
O(n^{\log_b a}) & \text{if } f(n) = O(n^{\log_b a - \varepsilon}) \\
O(n^{\log_b a} \log n) & \text{if } f(n) = O(n^{\log_b a}) \\
O(f(n)) & \text{if } f(n) = \Omega(n^{\log_b a + \varepsilon})
\end{cases} T ( n ) = ⎩ ⎨ ⎧ O ( n l o g b a ) O ( n l o g b a log n ) O ( f ( n )) if f ( n ) = O ( n l o g b a − ε ) if f ( n ) = O ( n l o g b a ) if f ( n ) = Ω ( n l o g b a + ε )
Algorithm Complexity Visualized#
Image with Caption#
Binary search repeatedly halves the search space — O(log n) time complexity.
Callouts#
Collapsible Section#
Click to see the full Master Theorem For T ( n ) = a T ( n / b ) + f ( n ) T(n) = aT(n/b) + f(n) T ( n ) = a T ( n / b ) + f ( n ) where a ≥ 1 a \geq 1 a ≥ 1 , b > 1 b > 1 b > 1 :
T ( n ) = { O ( n log b a ) if f ( n ) = O ( n log b a − ε ) O ( n log b a log n ) if f ( n ) = Θ ( n log b a ) O ( f ( n ) ) if f ( n ) = Ω ( n log b a + ε ) T(n) = \begin{cases}
O(n^{\log_b a}) & \text{if } f(n) = O(n^{\log_b a - \varepsilon}) \\
O(n^{\log_b a} \log n) & \text{if } f(n) = \Theta(n^{\log_b a}) \\
O(f(n)) & \text{if } f(n) = \Omega(n^{\log_b a + \varepsilon})
\end{cases} T ( n ) = ⎩ ⎨ ⎧ O ( n l o g b a ) O ( n l o g b a log n ) O ( f ( n )) if f ( n ) = O ( n l o g b a − ε ) if f ( n ) = Θ ( n l o g b a ) if f ( n ) = Ω ( n l o g b a + ε )
Keyboard Shortcuts#
Press Ctrl + C to copy, Ctrl + F to search.
Highlighted Text#
Use <mark> for important highlighted terms inline.
If it renders correctly here — calculus, matrices, ML formulas, piecewise functions, charts, callouts — it will render correctly in any post you write.