Posted on November 30, 2024
One day, I came up with an idea of training a neural network that derives a latent space parametrized as a hypersphere to facilitate non-biased representation learning. To achieve a non-biased representation, I had to sample points uniformly on a hypersphere, but I had no idea how to do it. I just naively thought that sampling points from a multivariate normal distribution and then simply normalizing them would be a simple solution, as the normal distribution is isotropic and rotationally symmetric, but I was not sure if it really makes sense mathematically.
After a few days of thinking, I finally came up with an idea from the form of the Gaussian function. Specifically, the probability density function (PDF) of a normally distributed random variable \(x\) is given by:
$$ p(x) = \frac{1}{\sigma\sqrt{2\pi}} \mathrm{exp}\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), $$where \(\mu\) is the mean and \(\sigma\) is the standard deviation. For simplicity, let's just consider the standard normal distribution with a zero mean and a unit variance (i.e. \(\mu=0\) and \(\sigma=1\)):
$$ p(x) = \frac{1}{\sqrt{2\pi}} \mathrm{exp}\left(-\frac{x^2}{2}\right). $$So, I came up with two ideas: (i) the PDF of a multi-dimensional vector is a product of the PDFs of each dimension, and (ii) the product of exponential functions has an exponent that is a sum of each exponent. And then, I just got an intuition that the sum of squared terms in the exponent can be somehow related to the radius of the hypersphere.
Let's consider a vector \(\mathbf{x} = [x_1, x_2, \dots, x_n]\). Then the PDF of a vector \(\mathbf{x}\) is given by:
$$ \begin{aligned} p(\mathbf{x}) &= p(x_1) p(x_2) \cdots p(x_n) \\ &= \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}} \mathrm{exp}\left(-\frac{x_i^2}{2}\right) \\ &= \frac{1}{(2\pi)^{n/2}} \mathrm{exp}\left(-\frac{x^2_1}{2}\right) \mathrm{exp}\left(-\frac{x^2_2}{2}\right) \cdots \mathrm{exp}\left(-\frac{x^2_n}{2}\right) \\ &= \frac{1}{(2\pi)^{n/2}} \mathrm{exp}\left(-\frac{x^2_1 + x^2_2 + \cdots + x^2_n}{2}\right) \\ &= \frac{1}{(2\pi)^{n/2}} \mathrm{exp}\left(-\frac{\|\mathbf{x}\|^2}{2}\right), \end{aligned} $$where \(\|\mathbf{x}\| = \sqrt{x_1^2 + x_2^2 + \cdots x_n^2}\) is the Euclidean norm of the vector \(\mathbf{x}\). Now, we can see that \(p(\mathbf{x})\) only depends on the Euclidean norm of the vector \(\mathbf{x}\) but not on the direction of it. This means that points sampled with a multivariate normal distribution with a fixed Euclidean norm result in a uniform distribution on a hypersphere.
Let's go back to the first idea of sampling points from a multivariate normal distribution and then normalizing them. Let \(\mathbf{y} = \frac{\mathbf{x}}{\|\mathbf{x}\|}\) be the normalized vector of \(\mathbf{x}\). While the magnitude \(\|\mathbf{x}\|\) is random once sampled from the normal distribution, the new vector \(\mathbf{y}\) is no longer dependent on the vector magnitude, and we know that normalization maintains the direction of the vector. Therefore, we can say that the vector \(\mathbf{y}\) is uniformly distributed on a unit hypersphere.
I strongly believe that the proof can be more rigorous by deriving \(p(\mathbf{y})\) itself, but I think this is enough to understand the idea that we can achieve uniform sampling on a unit hypersphere by sampling-and-normalization. Now, my next question is how to sample points on a hypersphere with a maximum coverage of the surface. Specifically, given \(N\) points on a hypersphere, how can we sample a new point that maximizes the minimum distance to the existing points? Would there be a closed-form solution for this prolblem? This will be the next step to be solved for my new research topic about a latent space with a hypersphere parametrization. But anyway, it was personally a great experience to mathematically prove the abstract idea that I had in my mind. I hope this post helps you as well, and I'll try this to my new research project, so stay tuned for the results!