A straightforward measure of space filling is the minimum Euclidean distance between two sampled points X in the generated ensemble: Other measures of space coverage consider all dimension at once. Since N is the sample size, this measure is maximized when there is exactly one point in each bin - it is a measure that LHS maximizes. In 1D a measure of space coverage is by dividing each dimension in N equiprobable bins, and count the fraction of bins that have at least a point. We are best equipped to visualize space coverage via 1D or 2D projections of a sample. There are different measures of space coverage. This definition relies on the remark that. Then, the indicator of sample quality looks at the maximal level of correlation across all variables: In the above equation, and are the average sampled values of inputs i and j. For inputs variables i and j among the p input variables, we note and the values of these variables i and j in sample k have: Sample correlation is usually measured through the Pearson statistic. In what follows we note the k th sampled value of input variable i, with and. The goal there is not to write a summary of summaries but rather to give a sense that there is a relationship between which indicators of sampling quality matter, which sampling strategy to use, and what we want to do. One can look at authors such as Sheikholeslami and Razavi (2017) who summarize similar sets of variables. In fact, it is illuminating to see that sample quality metrics sometimes trade-off with one another, and several authors have turned to multi-objective optimization to come up with Pareto-optimal sample designs (e.g., Cioppa and Lucas, 2007 De Rainville et al., 2012). Then we’ll look at space coverage metrics, which are more numerous, do not look exactly at the same things, and can be sometimes conflicting. We’ll look at it first because it is pretty straightforward. One is correlation between sampled values of the input variables. Therefore, there are two kinds of issues to look at. For instance, if output values are hugely dependent on values of input 1, there will be large variations of the output values as values of input 2 change, regardless of the real impact of input 2 on the output. In the above configuration, it is easy to see that on top of poor space coverage, correlation between the sampled values along both axes is also a huge issue. Indeed, this is also a perfectly valid LHS configuration: But space coverage in multiple dimensions all depends on the luck of the draw. It is easy to see that by definition, LHS has a good space coverage when projected on each individual axis. Number bins from 1 to N each dimension.Ģ) Randomly draw points such that you have exactly one in each bin in each dimension.įor instance, for 6 points in 2 dimensions, this is a possible sample (points are selected randomly in each square labelled A to F): If we want uniform sampling, each bin will have the same length. This algorithm relies on the following steps for drawing N samples from a hypercube-shaped of dimension p.:ġ) Divide each dimension of the space in N equiprobabilistic bins. A quick and popular way to generate a sample that covers the space fairly well is latin hypercube sampling (LHS McKay et al., 1979). The difficulty though, is how we define “how well” it practice, and the implications that has. Intuitively, the first criterion for a good sample is how well it covers the space from which to sample. Then it will show some handy visualization tools for quickly testing and visualizing a sample. It will first look at what makes a good sample using some examples from a sampling technique called latin hypercube sampling. Likewise, the method of Morris ( Morris, 1991), less computationally demanding than Sobol’s ( Herman et al., 2013) and used for screening (i.e., understanding which are the inputs that most influence outputs), relies on specific sampling techniques ( Morris, 1991 Campolongo et al., 2007).īut what makes a good sample, and how can we understand the strengths and weaknesses of the sampling techniques (and also of the associated sensitivity techniques we are using) through quick visualization of some associated metrics? The popular method of Sobol’ ( Sobol’, 2001) relies on tailor-made sampling techniques that have been perfected through time (e.g., Joe and Kuo, 2008 Saltelli et al., 2010). This is the case for instance, for sensitivity analysis (i.e., the analysis of model output sensitivity to values of the input variables). State sampling is a necessary step for any computational experiment, and the way sampling is carried out will influence the experiment’s results.