# Half-tapering strategy for conditional simulation with large datasets

###### Abstract

Gaussian conditional realizations are routinely used for risk assessment and planning in a variety of Earth sciences applications. Assuming a Gaussian random field, conditional realizations can be obtained by first creating unconditional realizations that are then post-conditioned by kriging. Many efficient algorithms are available for the first step, so the bottleneck resides in the second step. Instead of doing the conditional simulations with the desired covariance (F approach) or with a tapered covariance (T approach), we propose to use the taper covariance only in the conditioning step (Half-Taper or HT approach). This enables to speed up the computations and to reduce memory requirements for the conditioning step but also to keep the right short scale variations in the realizations. A criterion based on mean square error of the simulation is derived to help anticipate the similarity of HT to F. Moreover, an index is used to predict the sparsity of the kriging matrix for the conditioning step. Some guides for the choice of the taper function are discussed. The distributions of a series of 1D, 2D and 3D scalar response functions are compared for F, T and HT approaches. The distributions obtained indicate a much better similarity to F with HT than with T.

Keywords: Wendland covariance functions, sparsity index, infill asymptotics, spectral density, covariance tapering, taper function.

## 1 Introduction

Conditional simulations are routinely used in Earth sciences applications to assert the distribution of responses that are non-linearly related to the state variables of the system under study. For the specific case of Gaussian random fields, the general approach is to produce many different realizations of the system, each being consistent with the chosen covariance function and the observed data points, and then to apply a physical forward model to get the response variables corresponding to the simulated system state. Conditioning to observed data is usually done by post-conditioning by kriging although some algorithms like sequential Gaussian simulation (SGS) can perform the conditioning directly as the simulation progresses. However, SGS has to use local neighborhoods, hence introducing discontinuities and only approximating the desired covariance (Emery, 2004, 2008; Paravarzar et al, 2015; Safikhani et al, 2016).

Given the availability of increasingly large datasets, conditional simulation algorithms still face the size problem. When tens of thousands conditioning data are available, it is hardly possible to produce conditional simulations without some form of approximation. At least five different approaches were proposed to alleviate the size problem: a) the classical approach of conditioning locally, see e.g. Chilès and Delfiner (2012), b) the fixed rank kriging (Cressie and Johannesson, 2008), c) tapering of the covariance functions (Furrer et al, 2006), d) using a combination of b) and c) (Gneuss et al, 2013; Sang and Huang, 2012) and e) using covariance functions that possess sparse precision matrix (inverse of the covariance matrix) when defined on a torus (Lindgren et al, 2011). Approaches b), c) and d) were proposed in the context of prediction rather than simulation, but they can obviously be used for the post-conditioning by kriging.

All the above approaches have their caveats. With a), discontinuities are introduced as the local neighborhood changes. Moreover, the simulated covariance is distorted compared to the desired covariance. Using b) amounts to keep only the low frequency components of the covariance function which can produce overly smoothed realizations. Approach c) is dependent on the range of the taper function compared to the effective range of the covariance function. When the range of the taper function is larger than the effective range, covariance distortion is reduced but so is the sparsity of the tapered covariance matrix. On the contrary, when the range of the taper function is chosen small, the resulting sparsity is important but so is the distortion of the covariance function. Approach d) appears more complex as one has to split the prediction step in two parts corresponding to a long range-short range decomposition. The long range component is estimated by fixed-rank kriging and the short range component is estimated by tapering the covariance of the residuals. Finally, although being extremely efficient in some cases, approach e) has limitations, in particular because the Markov property on which it relies has been firmly established only for Matérn covariances.

Many efficient algorithms exist to produce unconditional Gaussian fields. As a well known example, the turning bands method (TBM) (Matheron, 2011; Lantuéjoul, 2002; Emery and Lantuéjoul, 2006; Emery et al, 2015) has the lowest computational complexity with , where is the number of simulated points. Moreover, TBM has no practical memory limitations as points to simulate can be simulated by subsets of any desired size. Alternative methods include spectral methods (Shinozuka and Jan, 1972; Emery et al, 2015) and their more recent FFT implementation on grids, using an approach based on circulant embedding of block Toeplitz covariance matrices (Chan and Wood, 1999). We do not detail any further non conditional simulation methods.

Hence, the bottleneck for large conditional simulation lies in the post-conditioning part. This contribution examines in details a suggestion made by Stein (2013): produce unconditional simulations with efficient algorithm suitable for the desired covariance and use the tapered covariance only for the post-conditioning part. We call this approach half-tapering (HT). We expect this approach to better reproduce the short scale variations than do conditional simulations using the tapered covariance (T) for both parts. Consequently, the forward model uncertainty should be better approximated with the HT approach than with the T approach. Although, for simplicity, only stationary tapers are considered, our theoretical results (propositions 1 to 3) apply as well to non-stationary and/or non-isotropic tapers constructed following Bolin and Wallin (2016).

We emphasize that we assume that the covariance function is known, i.e. it has been estimated or chosen by the user. Therefore, we do not consider at all the problem of using a tapering approach for the estimation of the covariance function. In our conditional simulation setting, the user expects an accurate reproduction of the output statistics for the known covariance. There is a plethora of possibilities for those outputs, and we will consider typical examples in 1D, 2D and 3D.

After reviewing some results on the tapered covariance approach for prediction in Section 2, we adapt and extend to the context of conditional simulation the measure of similarity between tapered vs full covariance (F) predictions proposed by Stein (1993, 1999) and Furrer et al (2006). Then, in Section 3, we derive formulas to predict sparsity of the covariance matrix based on the tapered covariance. In Section 4, different non-linear forward models applied on 1D, 2D and 3D simulations are used to assess the similarity of HT and T response distributions with those obtained using the desired covariance (F). We finish with some elements of discussion in Section 5.

## 2 Methodology

### 2.1 Set-up and notations

Let us consider a zero mean Gaussian random field with stationary covariance function

Assuming is known and given sample values , the Best Linear Unbiased Predictor at a point , also referred to as kriging in the geostatistics literature, is:

where is the matrix with element for , is the vector of covariances and denotes the transpose of matrix .

We will further denote a tapering function, i.e. a correlation function with compact support. The taper is identically equal to zero outside a particular range , typically the ball of radius in . Methods to construct compactly supported positive definite functions with chosen order of differentiability have been proposed as early as Bohman (1960) and Matheron (1965). Other constructions were proposed later in a different context (Wendland, 1995; Wu, 1995). Gneiting (2002) recognized that Wu’s construction is equivalent to Matheron’s clavier sphérique for which specific members are the spherical, cubic and penta covariance models, see Table 2. Other tapers are provided by the Wendland functions (Wendland, 1995) and the Bohman construction (Bohman, 1960).

The tapered covariance function is then the product of the original covariance function by the taper

The resulting tapered covariance matrix, , defined by is thus the Hadamard product matrix , where and is the matrix with elements for . The tapered covariance function is definite positive, since the product of positive definite functions is positive definite by Schur’s product theorem. By construction, the matrix has a high proportion of zero elements when is small compared to pairwise distances between data locations. The taper function must be tailored to the particular desired covariance function, in particular in terms of range and regularity at the origin. We will return to this issue later.

### 2.2 MSE criterion

The MSE at point , also known in geostatistics (Chilès and Delfiner, 2012) as the kriging variance, is

(1) |

where denotes the simple kriging variance obtained with and is the variance of . In Eq. (1), the variance is computed for the underlying model , hence the notation . When the prediction is done with a plug-in covariance function whereas the true covariance is , the kriging weights are obtained by solving:

(2) |

where and are respectively the right member and the kriging matrix computed with . Hence, one obtains:

(4) | |||||

Since is the kriging variance, i.e. the variance of the Best Linear Unbiased Predictor, one has necessarily , which we state and prove formally below.

###### Proposition 1

In the setting above, one has for all .

Proof From the definitions of and one has,

(6) |

The last expression is always nonnegative since is a covariance matrix, hence positive semi-definite.

It is worth recalling that both MSE can be computed without actually knowing data values. Only the point locations and the covariance function are needed. The two MSEs will be different unless . The (theoretical) mean square error of prediction (MSE) has been used in Furrer et al (2006) based on Stein (1993) and Stein (1999) as a measure of discrepancy between the original (supposed true) covariance function and the covariance function used to perform the prediction.

### 2.3 MSE criterion for simulation

The MSE criterion for prediction of Eq. (1) and (4) can be adapted for simulation. For this, it is useful to adopt the decomposition corresponding to post-conditioning by kriging:

(7) |

where is the conditional simulation, is the unconditional simulation independent on , is the kriging with data, and is the kriging with simulated values at data points. The prediction error of the conditional simulation is then written:

(8) |

The right hand expression combines two independent kriging errors, one for the field itself, the other one for the simulated field. Hence when both errors are obtained using covariance , one obtains the classical result (Chilès and Delfiner, 2012)

(9) |

When using a conditional simulation entirely computed with the covariance , whereas the true covariance function is , one has:

(10) |

where is the kriging variance computed with covariance . When unconditionally simulating with covariance and then post-conditioning with covariance , one obtains:

(11) |

Hence, both for kriging and conditional simulations, good approximations will be obtained by finding adequate covariance functions such that is minimized. Regarding the MSE criterion for simulations, one usually has , i.e. . So the HT approach normally improves over the T method in terms of . On a total of more than a million simulations with varying covariance functions, covariance tapers, range parameters and kriging geometry, we only observed a handful of cases where . In all these cases, the differences between both values were extremely small.

We observe that in Eq. 7 the term is the same for T and HT but the term is simulated with the true covariance for HT and with the plug-in covariance for T. Hence, HT should present local variability more similar to F than the T approach. Stein (2013) noted that the conditional variance of obtained by HT corresponds to the prediction error variance obtained with the tapered covariance.

Since the taper function is a covariance function normalized such that , the tapered covariance verifies . We therefore anticipate that the kriging variance can only increase when less spatial correlation is present. We establish the following result (see Appendix for a proof):

###### Proposition 2

In the setting above, one has for all .

### 2.4 Spectral densities of tapers and covariances

From Bochner’s theorem (Bochner, 1933; Chilès and Delfiner, 2012), a stationary covariance function in has the spectral representation

(12) |

where is a positive bounded symmetric spectral measure on and denotes the inner product between vectors and . When is square integrable, the spectral measure can be written as a spectral density, . Moreover, when is absolutely integrable, the spectral density can be obtained by inversion of Eq. 12 (Lantuéjoul, 2002):

(13) |

When is isotropic, one can write , where denotes the Euclidean norm and is a continuous mapping with . The previous equation simplifies to a 1D integral transform (Sneddon, 1951; Lantuéjoul, 2002):

(14) |

where is the Bessel function of the first kind of order .

Tables 1 and 2 present some taper and covariance functions along with their spectral densities in obtained using Eq. 14. For simplification, all functions are expressed as isotropic correlation functions with normalized distance , and we denote the norm of the frequency vector. With a slight abuse of notations, we shall write in the isotropic case for . Results can be extended to the anisotropic case following Marcotte (2015, 2016). For the Cauchy covariance Eq. 14 could be solved only for as in Marcotte (2015). However, following an observation by Lim and Teo (2008), the spectral density expression remains valid for as can be checked by evaluating directly Eq. 12. The smoothness of the covariance functions is , where is the order of the lowest odd monomial in . The decay rate of the spectral densities at large frequencies is either exponential (Gaussian and Cauchy) or is determined by the largest negative exponent on . For the Cauchy spectral density, we use the fact that decay rate of Bessel functions () is proportional to for all (Lim and Teo, 2008).

Name | Smoothness | Decay rate | ||
---|---|---|---|---|

Spherical | 0 | |||

Cubic | 2 | |||

Penta | 4 | |||

Bonham | 2 | |||

Wendland | 0 | |||

Wendland | 2 | |||

Wendland | 4 |

Name | Smoothness | Decay rate | ||
---|---|---|---|---|

Cauchy | with | |||

Matern | with | |||

Exponential | 0 | |||

Gaussian |

### 2.5 Tail condition and convergence of HT for conditional simulations

Let us consider , a zero mean second-order stationary Gaussian random field on with covariance function , and let be another covariance function on . Let us further denote their spectral densities and , respectively. The tail condition is (Stein, 1993, 1999; Furrer et al, 2006)

(15) |

When the tail condition is fulfilled, asymptotic equivalence of the mean squared prediction error for the two covariance functions can be established under infill asymptotics (i.e. fixed domain, increasing number of observations, ) (Stein, 1993, 1999; Furrer et al, 2006):

(16) |

where it is recalled that in our context is the tapered covariance. Eq. (16) indicates that under infill asymptotics, the interpolation with or are asymptotically equivalent and that the ratio of the prediction variance tends to a non null, finite constant. In Furrer et al (2006) simulations show that the ratio stabilizes only slowly, especially for covariances with a more regular behavior at the origin. Note that the tail condition is a sufficient condition, but that it is not a necessary one. Establishing necessary conditions for Eq. (16) is still an open problem. Combining Eq. (16) with Eqs. (9) to (11) enables to state the following results.

###### Proposition 3

In the setting above,

(17) |

and

(18) |

Proposition 18 shows that the HT approach is asymptotically equivalent to F in terms of for simulation, but that the T approach is not.

The tail condition implies a similar decay rate of the spectral densities of taper, , and the covariance function, . The decay rate indicated in Tables 1 and 2 of covariance functions in enables to identify covariance-taper combinations that meet the tail condition, a sufficient condition to guarantee convergence of ratio with increasing . Note the direct connection of decay rate and smoothness in these Tables. As an example the exponential covariance could be tapered with a spherical or a Wendland taper as they all have the same decay rate. For the specific case of the Matérn covariance, Furrer et al (2006) and Zhang and Du (2008) proved that the taper function can present a faster decay rate than the Matérn covariance. This result has been extended to other covariances by Stein (2013). Note also that the tail condition is not fulfilled by any of the tapers considered for the Cauchy and the Gaussian covariances as they both show an exponential rate of decay of their associated spectral density. To the best of our knowledge, finding covariance functions defined on compact support with exponential rate of decay of their spectral density is still an open problem.

Figure 1 illustrates the prediction MSE ratios obtained as a function of the taper range, , for a fixed sampling density. A random stratified sample of size 400 is taken over the unit square and the MSE is computed at the central point. This is repeated 50 times and the average MSE is retained to compute the MSE ratio. Performances of the tapers are measured by how short the range is for a given MSE ratio. Equivalently it can be measured by how close to 1 is the MSE ratio for a given taper range. When are spherical, exponential and cubic covariances, the best taper functions share the same mean squared differentiability and decay rate as . When is a Matérn covariance with , the smoothness is , which corresponds to mean-squared continuity but not mean-squared differentiability. The corresponding decay rate is , which is not achieved by any of the taper in Table 1. As expected, the cubic and Wendland tapers show better MSEs than the spherical taper.

## 3 Degree of sparsity

To help choosing the range of the taper function, it is useful to approximate the sparsity of the covariance matrix for the post-conditioning. For this, we assume the data points are randomly located within the field. Then, the distance between a pair of (distinct) points is also random. This distance is the one used to compute off-diagonal terms in the covariance matrix. The distance for the diagonal of the covariance matrix is obviously zero.

The cumulative distribution of the distance between two points randomly and uniformly located within an hypersphere with unit radius in has been studied in Deltheil (1926). It is given by:

(19) |

where and stands for the incomplete beta function.

The sparsity index of the tapered covariance matrix for points randomly located over the disk or sphere with unit radius can be assessed by taking:

(20) |

where is the finite range of the tapering function and is the number of samples within the disk/sphere of unit radius. The term takes into account the diagonal of the covariance matrix that reduces the overall sparsity. For large this term becomes negligible and one has .

Cumulative distribution of distance for points randomly located over rectangles and cube can be found in the literature (Philip, 1991). However, the relations are rather lengthy and cumbersome to use. Moreover, the distribution of data points is rarely truly random in practice. As we only need an approximation of the sparsity, we instead assume that the sparsity for a square or a cube can be well approximated by a disk or a sphere with the same area/volume and uniform random location within. Figure 2 confirms the validity of this assumption by comparing the theoretical sparsity index obtained with Eq. 20 and the sparsity obtained experimentally after covariance tapering with a finite range of for a simulation with 2000 points randomly located within a square of area and a cube of volume . The approximation with the disk/sphere for the square/cube is deemed sufficiently precise for our purpose. Discrepancies appear non-negligible only at small sparsity values where tapering is anyway not useful. The figure also shows that larger sparsity is expected for the 3D case than for the 2D case. However, even in the 2D case, sparsity can be obtained for .

Hence, for a rectangular area of size and a taper function with finite range , the sparsity can be approximated by computing first and then evaluating Eq. 20. Similarly, in 3D for a cube of edges one computes and then evaluates Eq. 20. As examples, suppose the taper range is selected as 0.6 times the effective range of the desired covariance and that the effective range represents of the side length of the square field under study. One computes , and . In 3D, one computes , and . Hence, in 3D when , the covariance matrix required for conditioning by kriging using all data at once will have approximately non-zero entries. This is manageable as opposed to the prohibitive non-zeros entries for the full covariance matrix.

Figure 2-right compares the experimental sparsity of the covariance matrix obtained for samples taken following four sampling designs: i. regular grid, ii. random stratified (one random sample per cell), iii. purely random and iv. following a log-Gaussian Cox process as in Bolin and Wallin (2016). For the Cox process, locations are drawn from a Poisson process of intensity where is a zero mean unit variance Gaussian field with exponential covariance and practical range 0.3 the square or cube side length. Figure 2 shows that the sparsity is robust to the exact sampling strategy as long as homogeneous coverage is ensured. However, less sparsity is obtained (especially in 2D) with the clustered sampling generated by the Cox process. In this case, it might be useful to adapt the taper range to the local sampling density as suggested by Bolin and Wallin (2016).

## 4 Results

The aim of this Section is to illustrate the ability of the HT approach to approximate accurately highly non linear characteristics of the simulated random fields, as compared to the much more computer intensive F approach. On four selected situations in 1D, 2D and 3D, we compare the conditional simulations obtained using three different approaches: i. the unconditional simulation and the conditioning by kriging are both done with the desired covariance (F), ii. the unconditional simulation and conditioning by kriging are both done with the tapered covariance (T), and iii. the unconditional simulation is done with the desired covariance and the conditioning by kriging is done with the tapered covariance (HT). Without loss of generality, small scale illustrative examples are used throughout to limit the computational burden of some response functions.

### 4.1 1D example

Two different forward models are computed on each of the realizations for the three approaches. The first forward model is the maximum absolute difference between consecutive points along the profile (discrete estimate of the slope) The second forward model is the total length of the 1D profile as in Chilès and Delfiner (2012). The distributions of the responses are then compared using boxplots and two-sample Kolmogorov-Smirnov goodness of fit tests are computed. The p-values of the test statistic measure the similarity of the distributions.

The profiles are simulated at 100 regularly spaced points. The desired covariance, , is exponential with effective range equal to 1/3 of the simulated length. Five hundred parent unconditional realizations are simulated. In each one a different sample is obtained by taking point 1, point 100 and eight points randomly picked among points 2 to 99. Points 1 and 100 are forced in the sample to avoid being in extrapolation mode. Each sample is then used to create a single conditional realization, so all conditional realizations are independent. All the simulations are done by Cholesky decomposition. As a consequence of the discussion in Section 2, the taper covariance function is set to be spherical. Its range is varied between 0.25 and 3 times the effective range.

Figure 3 presents the boxplots for the three approaches when the response function is the maximum absolute difference between consecutive points. HT outperforms T as confirmed by the KS tests which indicate HT vs F distributions are not significantly different (at ) for taper ranges above 0.75 times the effective range while they are significantly different in all cases for the T vs F approaches (see Fig. 4).

Figure 5 presents the boxplots for the three approaches for the response length of the profile. Clearly, the HT length distributions are more similar to the distributions obtained with F than the distributions with the T approach. This is again confirmed by the two-sample Kolmogorov-Smirnov (KS) test p-values (Fig. 6) which are larger for the HT approach than for the T approach. In the latter case, significant differences (i.e. p-value) are obtained even for a taper range that is three times the effective range of the desired covariance. On the contrary, the HT approach indicates non-significant differences for a taper range representing 0.75 times the effective range.

### 4.2 2D examples

In petroleum and environmental applications it is important that the simulated fields represent adequately the connectivity of facies. Moreover, flow path and corresponding transit time between two given sites are relevant for example to environmental contaminant transport problems or for assessment of potential localization of injection wells in reservoir engineering. It is thus of primary importance that tapering strategies reproduce accurately connectivity and flow characteristics. In these experiments, is Matérn with and effective range equal to half the simulated length, and is Wendland, as shown in Section 2.5 and illustrated in Figure 1. Forty samples of size 100 are taken as conditioning data from as many different unconditional realizations. For each sample, a series of 40 conditional realizations are produced for each approach F, T and HT.

#### 4.2.1 Facies connectivity

We first consider a two facies case: one permeable and one impervious. Renard and Allard (2013) provide a review of the different metrics, both global and local, that can be used to measure connectivity. Among the global metrics, a useful one is the proportion of connected pairs among all pairs within the permeable facies (Hovadik and Larue, 2007). More precisely, one computes:

(21) |

where is the proportion of permeable facies, is the number of clusters of permeable facies, is the size of permeable cluster and is the total number of permeable cells. For a given proportion , the permeable phase is defined as the set of points such that , where is the cumulative distribution function of a Gaussian random variable. Specifically, a threshold is applied on the Gaussian realizations so as to get for each ensemble of realizations of F, T and HT. The connectivity measure is then computed on these 2D truncated Gaussian random fields. For illustration, Fig. 7 presents one randomly chosen conditional realization of the field by the F, T and HT methods for two different taper ranges. The same conditioning data is used in each case. Note that the four corners of the 2D field are always included in the conditioning samples again to avoid too strong extrapolation.

Figure 8 shows the boxplots of the connectivity measure for different values of the ratio taper range/effective range. Results show clearly that the T approach dramatically underestimates the connectivity measure, whereas it is reasonably well reproduced with the HT approach, even for relatively high sparsity.

#### 4.2.2 Transit time

For the same set of simulations, a second response function is obtained by computing fastest transit time between the upper left cell and the bottom right cell of each realization. The exponential of the Gaussian fields are interpreted as speed. They are converted to slowness by taking the inverse of the speed. Then, the fastest path between the upper left point and the bottom right point of the field is determined by applying Dyjkstra algorithm (Dyjkstra, 1959). Since transit times are rather slow to compute, we limit ourselves to 10 different conditional realizations for each of the 40 samples. Fig. 9 shows examples of realizations obtained by the methods F, T and HT for two different taper ranges. The fastest path is also indicated. The distributions of the responses are described by boxplots for different taper ranges (see Fig. 10). Again, the HT approach shows an excellent reproduction of the transit time, whereas the T approach requires a sparsity less than 77%.

### 4.3 A 3D example

We now explore the connectivity in 3D. It is known (see e.g. (Renard and Allard, 2013)) that in 3D percolation holds at lower proportions than in 2D. We thus generate 3D realizations on a grid of truncated Gaussian random fields with . The desired covariance, , is exponential with effective range . The taper is the spherical covariance. Forty samples of size 100, taken from as many different reference fields, are used to provide each 40 realizations. Figure 11 shows the distributions of the connectivity measure for conditional simulations obtained from methods F, T and HT. Results indicate a clear discrepancy between connectivity distributions obtained by F and T methods. On the contrary, distributions obtained with HT are similar to those with F, even for a strong tapering inducing over sparsity.

## 5 Discussion and Conclusion

We explored theoretically and on simulations a Half-Tapering approach in which non conditional simulations are done with the desired covariance function using efficient methods such as turning bands or circulant embedding methods. The conditioning step is then performed using a tapered covariance function . The HT method, already suggested in Stein (2013) enables fast conditional simulation free of neighborhood discontinuities, even with very large datasets.

The criterion enables to evaluate the quality of the approximation obtained with the HT approach compared to the F approach. We have extended to the simulation case the asymptotic results that were established in Furrer et al (2006) and extended in Stein (2013) for Best Linear Unbiased Prediction, thereby offering solid theoretical grounds to this approach. It was proven that, contrary to the T approach, the HT method is asymptotically equivalent to the F method. Expression to approximate the sparsity of the covariance matrix corresponding to a given taper range were derived. Hence, both the quality of the approximation and the corresponding sparsity can be assessed beforehand to find the best compromise for the taper range to use in the HT approach.

The proposed HT approach keeps the efficiency and low computational complexity of most efficient unconditional simulation algorithms and benefit of the recent advances on tapering functions for the post-conditioning by kriging. As an example, the computational complexity of the turning bands unconditional simulation is only where is the number of points to simulate. For the post-conditioning, the memory requirement is inversely proportional to the matrix sparsity. Moreover, best implementations of sparse Cholesky decomposition used to solve the kriging system have computational complexity proportional to the number of non-zero elements instead of for full matrices, being the number of conditioning data (see Davis (2006), Chen et al. (2008(@)).

The impact of approximating the desired covariance by the full tapering (T) and our proposed half-tapering (HT) approach has been assessed in both 1D, 2D and 3D cases with four different responses functions. In all cases considered, HT appeared more similar to F than T for both the response distributions and the ratio.

The examples show that for the same similarity with F, simulations with HT allow shorter taper range and larger sparsity than T. In mining applications, the deposit is usually 3D. It is common that a few ten thousand data are available. Moreover, the effective range of the desired covariance can be quite smaller than the size of the field. These are all favorable conditions for HT. It enables to post-condition the simulation quickly at once, without introducing undesirable discontinuities like in the traditional approach based on local neighborhoods.

All the examples indicated a direct relation between the and the similarity of T or HT responses with F responses. The ratios for T were substantially larger than for HT. Increasing the taper range normally reduces both ratios. One noticeable exception is observed in Fig. 1 where the combination cubic covariance-penta taper shows a slight MSE ratio increase with an increase of the taper range. However, this strange behavior was not observed in the cases of Wendland and cubic tapers. Hence, our results support the use of taper functions having the same differentiability at origin and spectrum decay rate as the desired covariance.

For all forward models considered, a ratio below 1.1 for HT was shown sufficient to provide similar responses to F on the non linear responses. On the contrary, for T, such a threshold on could not be established (see Figs 3, 5 and 11).

This work could be generalized in several directions that we now briefly mention. In the few cases where tapering alone does not suffice to bring enough sparsity, a possible extension to our approach would be to complete the tapering by using also fixed rank kriging, similarly as done in Sang and Huang (2012). The hybrid approach applied for the post-conditioning by kriging would enable to extend the computational advantages of HT to most situations.

Also, in this work, we have only considered the case of evenly spaced conditioning points. Bolin and Lindgren (2013) showed that tapering can perform poorly for kriging when data points are very irregularly located. The reason is that performances of the tapering approach is not only a function of the ratio of ranges of the taper and the desired covariance but also of the average spacing of observations. The optimal taper range achieves a balance between computational cost and accuracy. In regions with sparse observations the taper range could be increased to obtain lower ratios without penalizing significantly the sparsity. In regions with high density of data, the computation burden could be reduced. In Bolin and Wallin (2016), a spatially adaptive covariance tapering strategy where the taper range is allowed to vary in space is proposed to improve the performance of tapering. This strategy could also be implemented for conditional simulations when the conditioning data are very irregularly located. As an example, with 3D borehole data where sampling density is larger along borehole direction it might be required to use anisotropic tapers with shorter range parallel to the borehole axis.

An interesting alternative (Bevilacqua et al, 2016) to the tapering of the desired covariance would be to use directly the taper function as the covariance in the post-conditioning by kriging. Properties of this approach remain to be established. However, it seems quite clear that the differentiability and spectrum decay rate of the taper and the tapered covariance must be the same. In a few simulation tests, not shown here for conciseness, we obtained very similar results with the tapered covariance and the taper function alone.

Using a tapering approach when is a Cauchy or a Gaussian covariance function remains an open problem. For these covariance functions, the spectral density behave either as or as as . The tail condition, which is a sufficient condition seems to be overly demanding. New theoretical results for finding necessary conditions for tapering, or at least less demanding sufficient conditions must be established.

Multivariate extension of this work is relatively straightforward. It suffices to do the covariance tapering with a positive semi-definite multivariate model defined on compact support. The linear model of coregionalization (Chilès and Delfiner, 2012; Wackernagel, 2003) is an easy way to obtain such multivariate taper. Moreover, Porcu et al (2013) defined a class of multivariate radial covariances based on constructions involving Askey and Buhmann functions. They provided sufficient conditions for the positive definiteness of the multivariate models in this class.

## Appendix: Proof of Proposition 2

We first establish two lemmas:

###### Lemma 1 (Adapted from (Lu and Shiou, 2002))

Consider the symmetric block matrix

where is diagonal and and are symmetric non singular matrices such that and are non singular. Then,

Proof This lemma is a direct consequence of Theorem 2 in Lu and Shiou (2002).

###### Lemma 2

Let be sample points in finite domain and let be a covariance function on with . Let be a zero mean Gaussian random field with covariance function on . Let further be the matrix with elements , for and be the vector with elements , for and . Then, the matrix

is positive semi-definite for any .

Proof In order to show this, we need to show that for any vector , it holds that

(22) |

Let us denote . Then, since and since , Eq. (22) is equivalent to

Using and , we thus get very easily that

which finishes the proof.

We are now ready to provide the proof of Proposition 2. We must show that for all . As usual, we drop the dependency on for sake of conciseness. Since and , we need to prove that:

(23) |

Since and , Eq. (23) is equivalent to

(24) |

To show that this expression is always nonnegative, we will show that the matrix with elements , for is positive definite (p.d.) except for the trivial case , corresponding to a taper with infinite range, where and Eq. 23 equals zero. Introducing the diagonal matrix , this matrix can also be written

(25) |

Since is invertible, it is p.d. if and only if is p.d. Using Lemma 1, its inverse is

(26) |

Using Lemma 2, one has that is p.d. Hence, using Schur’s product theorem, is p.d. and so is its inverse. As sums and products of p.d. matrices are p.d., we can conclude that in Eq. (26) is also p.d., which completes the proof.

### acknowledgements

We are indebted to one anonymous reviewer for his attentive and detailed review and for his numerous constructive comments. We thank Pr. Emilio Porcu from Universidad Técnica Federico Santa María in Valparaiso (Chile) for fruitful discussions and for providing us working material on generalized Wendland covariance functions and their use under fixed domain asymptotics. This research was financed in part by National Science Research Council of Canada (grant RGPIN105603-05)

## References

- Bevilacqua et al (2016) Bevilacqua M, Faouzi T, Furrer R, Porcu E (2016) Estimation and prediction using generalized wendland covariance functions under fixed domain asymptotics. ArXiv 1607.06921v1:1–36, URL https://arxiv.org/abs/1607.06921
- Bochner (1933) Bochner S (1933) Monotone funktionen, stieltjessche integrale und harmonische analyse. Mathematische Annalen 108:378–410
- Bohman (1960) Bohman H (1960) Approximate Fourier analysis of distribution functions. Arkiv for Matematik 4:99–157
- Bolin and Lindgren (2013) Bolin D, Lindgren F (2013) A comparison between Markov approximations and other methods for large spatial data sets. Computational Statistics & Data Analysis 61:7–21
- Bolin and Wallin (2016) Bolin D, Wallin J (2016) Spatially adaptive covariance tapering. Spatial Statistics Article in Press
- Chan and Wood (1999) Chan G, Wood ATA (1999) Simulation of stationary Gaussian vector fields. Statistics and Computing 9(4):265–268
- Chen et al. (2008(@) Chen, Y Davis,TA, Hager, WW and Rajamanickam, S (2008) Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate, ACM Trans. Math. Softw., 35 (3).
- Chilès and Delfiner (2012) Chilès J, Delfiner P (2012) Geostatistics: Modeling Spatial Uncertainty, 2nd edition. John Wiley & Sons
- Cressie and Johannesson (2008) Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J of the Royal Statist Society, Series B 70:209–226
- Davis (2006) Davis, TA (2006) Direct Methods for Sparse Linear Systems, SIAM Series Fundamentals of Algorithms.
- Deltheil (1926) Deltheil R (1926) Probabilités géométriques. Traité du calcul des probabilités et de ses applications, Tome II, Fascicule 5. Gauthier-Villars, Paris.
- Dyjkstra (1959) Dyjkstra E (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1:269–271
- Emery (2004) Emery X (2004) Testing the correctness of the sequential algorithm for simulating Gaussian random fields. Stochastic Environmental Research and Risk Assessment 18:401–413
- Emery (2008) Emery X (2008) Statistical tests for validating geostatistical simulation algorithms. Computers & Geosciences 34(11):1610–1620
- Emery and Lantuéjoul (2006) Emery X, Lantuéjoul C (2006) TBSIM: A computer program for conditional simulation of three-dimensional Gaussian random fields via the turning bands method. Computers & Geosciences 32(10):1615–1628
- Emery et al (2015) Emery X, Arroyo D, Porcu E (2015) An improved spectral turning-bands algorithm for simulating stationary vector gaussian random fields. Stochastic Environmental Research and Risk Assessment On-line:1–11, DOI 10.1007/s00477-015-1151-0
- Furrer et al (2006) Furrer R, Genton MG, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. J Computational and Graphical Statistics 15(2):502–523
- Gneiting (2002) Gneiting T (2002) Compactly supported correlation functions. Journal of Multivariate Analysis 83(2):493?–508
- Gneuss et al (2013) Gneuss P, Schmid W, Schwarze R (2013) Efficient approximation of the spatial covariance function for large datasets - analysis of atmospheric CO concentrations. In: Discussion Paper series recap15
- Hovadik and Larue (2007) Hovadik J, Larue D (2007) Static characterizations of reservoirs: refining the concepts of connectivity and continuity. Petro Geosci 13:195–211
- Lantuéjoul (2002) Lantuéjoul C (2002) Geostatistical Simulation. Springer, Berlin, Heidelberg
- Lim and Teo (2008) Lim T, Teo P (2008) Gaussian fields and Gaussian sheets with generalized Cauchy covariance structure, arXiv:0807.0022v1, http://arxiv.org/abs/0807.0022v1
- Lindgren et al (2011) Lindgren F, Rue H, Lindstrom J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Statist Soc B 73:423–298
- Lu and Shiou (2002) Lu TT, Shiou SH (2002) Inversers of 2 x 2 block matrices. Computers and Mathematics with Applications 43:119–129
- Marcotte (2015) Marcotte D (2015) TASC3D: a program to test the admissibility in 3D of non-linear models of coregionalization. Computers & Geosciences 83:168 – 175
- Marcotte (2016) Marcotte D (2016) Spatial turning bands simulation of anisotropic non linear models of coregionalization with symmetric cross-covariances. Computers & Geosciences 89:232–238
- Matheron (1965) Matheron G (1965) Les variables régionalisées et leur estimation. PhD thesis, Faculté des Sciences, Université de Paris
- Matheron (2011) Matheron G (2011) The theory of regionalized variables and its applications. École nationale supérieure des mines 5:211 p.
- Paravarzar et al (2015) Paravarzar S, Emery X, Madani N (2015) Comparing sequential Gaussian and turning bands algorithms for cosimulating grades in multi-element deposits. Comptes Rendus Geoscience 347:84–93
- Philip (1991) Philip J (1991) The probability distribution of the distance between two random points in a box, https://people.kth.se/ johanph/habc.pdf
- Porcu et al (2013) Porcu E, Daley DJ, Buhmann M, Bevilacqua M (2013) Radial basis functions for multivariate geostatistics. Stochastic Environmental Research and Risk Assessment 27(4):909–922
- Renard and Allard (2013) Renard P, Allard D (2013) Connectivity metrics for subsurface flow and transport. Advances in Water Resources, 55, 168–196
- Safikhani et al (2016) Safikhani M, Asghari O, Emery X (2016) Assessing the accuracy of sequential Gaussian simulation through statistical testing. Stochastic Environmental Research and Risk Assessment DOI 10.1007/s00477-016-1255-1, On-line first
- Sang and Huang (2012) Sang H, Huang J (2012) A full scale approximation of covariance functions for large spatial data sets. J R Statist Soc B 74:111–132
- Shinozuka and Jan (1972) Shinozuka M, Jan CM (1972) Digital simulation of random processes and its applications. Journal of Sound and Vibration 25:111–128
- Sneddon (1951) Sneddon I (1951) Fourier Transforms. McGraw-Hill
- Stein (1993) Stein M (1993) A simple condition for asymptotic optimality of linear predictions of random fields. Statistics and Probability Letters 17:399–404
- Stein (1999) Stein M (1999) Statistical Interpolation of Spatial Data: Some Theory for Kriging
- Stein (2013) Stein M (2013) Statistical properties of covariance tapers. J Comput Graph Statist 22:866–885
- Wackernagel (2003) Wackernagel H (2003) Multivariate geostatistics : an introduction with applications, 3rd edn. Springer, Berlin
- Wendland (1995) Wendland H (1995) Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Advances in Computational Mathematics 4(1):389–396
- Wu (1995) Wu Z (1995) Compactly supported positive definite radial functions. Advances in Computational Mathematics 4(1):283–292
- Zhang and Du (2008) Zhang H, Du J (2008) Covariance tapering in spatial statistics. In: Mateu J, (eds) EP (eds) Positive definite functions: From Schoenberg to space-time challenges, Universidad Jaume I., Castellon (Spain).