Sample size `n`

How many values should you generate within a simulation? Let’s explore.

If I draw 10 data points from a normal distribution with a mean of 0 and a standard deviation of 1 (i.e N(0,1)), after setting the seed to 10 (for no specific reason), here is the distribution of the values I get:

1 repetition simulating N(0,1) with n = 10:

If I repeat this simulation 24 times, here are the distributions of the 10 values pseudo-randomly sampled from N(0,1):

24 repetitions simulating N(0,1) with n = 10:

Note that because we are drawing from N(0,1), we expect the mean of the values drawn (mean(x), blue lines) to be very close to 0, i.e. the mean of the normal distribution we sample from (red dashed lines).

How are the means and standard deviations of the 24 repetitions distributed?

Distributions of the means and SDs from 24 repetitions simulating N(0,1) with n = 10:

Now, let’s do the same with a sample size n of 1000.

24 repetitions simulating the same distribution, i.e. N(0,1), with n = 1000:

Distributions of the means and SDs from 24 repetitions simulating N(0,1) with n = 1000:

Conclusion

The sample size within a simulation affects the precision with which the parameters of that distribution can be estimated.

What should determine the sample size within your simulation?

Choose a sample size that is relevant to the context of the simulation, e.g. the sample size you will be able to reach in your study or the minimum sample size that would allow you to detect the smallest effect of interest (as determined by a power analysis, which we will cover in a moment).

Sample size n

Conclusion

Sample size `n`