The classical likelihood ratio test (LRT) based on the asymptotic chi-squared distribution of the log likelihood is one of the fundamental tools of statistical inference. A recent universal LRT approach based on sample splitting provides valid hypothesis tests and confidence sets in any setting for which we can compute the split likelihood ratio statistic (or, more generally, an upper bound on the null maximum likelihood). The universal LRT is valid in finite samples and without regularity conditions. This test empowers statisticians to construct tests in settings for which no valid hypothesis test previously existed. For the simple but fundamental case of testing the population mean of d-dimensional Gaussian data with identity covariance matrix, the classical LRT itself applies. Thus, this setting serves as a perfect test bed to compare the classical LRT against the universal LRT. This work presents the first in-depth exploration of the size, power, and relationships between several universal LRT variants. We show that a repeated subsampling approach is the best choice in terms of size and power. For large numbers of subsamples, the repeated subsampling set is approximately spherical. We observe reasonable performance even in a high-dimensional setting, where the expected squared radius of the best universal LRT’s confidence set is approximately 3/2 times the squared radius of the classical LRT’s spherical confidence set. We illustrate the benefits of the universal LRT through testing a non-convex doughnut-shaped null hypothesis, where a universal inference procedure can have higher power than a standard approach.