Choice of Mathematical Model

Top Previous Next

When Walls does a least-squares adjustment, it assumes by default that the error variances of the observed XYZ components of a traverse are each proportional to the traverse's total length. (For data format details see Variance Assignments.) No doubt we can come up with a more interesting approximation of variance -- maybe one that's more realistic. If we wanted to extract as much information as possible from our survey data, we could try constructing a detailed statistical model, where instrument standard deviations and other effects, like target positioning error, are specified as parameters. It so happens that the adjustment routine used by Walls was originally designed to support such a model, wherein a traverse misclosure, for example, can be dealt with mathematically as if it were more an effect of random compass errors than of random taping errors. (For details, see the article cited in History and Acknowledgements, where matrix equivalents of the Statistical Formulas used by Walls are described.)

Unfortunately, there are several drawbacks to this approach. Since the detailed model requires that a vector's error components be treated as correlated, the numerical operation involves the manipulation of 3x3 covariance matrices in place of scalars. The math is more memory consumptive and significantly slower; a float operation in the Walls review dialog, for example, might take ten seconds instead of one. (This assumes that the same numerical algorithm is used. The direct sparse matrix method used by Walls is the fastest way I know of obtaining both the least-squares estimates and the additional statistics we need for data screening.)

A more serious drawback is the increased complexity of the program and its documentation. Would users appreciate a dialog featuring statistical parameters when it's unclear what the settings should be? The reliability of Suunto compass measurements surely drops off significantly with "steep" shots, but by how much? How large is the effect of target positioning error? Should our data format support a variance override consisting of six parameters? (This would allow a vector to substitute for a traverse or subnetwork, as it can now, without affecting adjustment results.) And finally, should we continue to think of and treat separately (during float operations, for example) the horizontal and vertical dimensions of traverses? The correlated model by itself doesn't justify this separation, but the practice of ranking traverses by horizontal and vertical consistency has been so effective in finding blunders that Walls users have come to depend on it. The fact remains that some common types of blunders (bad azimuths and reversed signs on inclinations) destroy just one of the two kinds of consistency while leaving the other unaffected.

Assuming cave surveyors are interested in these issues, and their surveys are good enough to be modeled this way, it's doubtful that introducing such complexity in Walls would be worth the effort. The end result would be data screening statistics much harder to get a feel for due to their dependence on subjective, or even arbitrary, parameter choices. Although UVEs could be obtained, they would be useless for objective comparisons between data sets unless we could all agree on a standard set of parameters and their values. Contrast this with inverse length-proportional weighting. While the latter amounts to a rather crude approximation of variance, it is supremely simple because it has no parameters -- that is, none that try to describe the errors in specific instrument readings. Instead of being sensitive to such assumptions, the UVEs can be regarded as variance scaling factors estimated from the data. Interpreted this way they can serve as objective measures of consistency. (See Relative Variance vs True Variance under Variance Assignments.)

To reduce the cost of least-squares computation, some cave mapping programs have assumed a watered down version of the detailed (correlated) variance model. This approach makes use of estimated measurement standard deviations to compute just the diagonal elements of a vector's covariance matrix, effectively assigning zero to the correlation terms. What makes this unattractive, in my view, is that for computational expediency alone we are giving up much of what's theoretically nice about the complete model. For one thing, assigning different variances to the two horizontal components while ignoring correlations causes adjustment results (e.g., the distances between points) to depend on which coordinate frame of reference we've chosen. Also, parameter choices are not made any simpler. If we're going to go this far then why compromise on the math?

The more realistic our mathematical model, the more sensitive our statistics will be to departures from the model -- that is, to blunders. But how many additional blunders would stand out if we used a different formula for assigning variances to vectors? A little experimentation with Walls should convince you there wouldn't be many. In the Suunto and tape tree survey example, if you select any one of the 226 single-vector traverses and perturb an angle measurement by four degrees, the program will rank the traverse as the worst of the entire survey. It will also highlight the changed measurement and suggest a correction of about four degrees. In most cases, even a 2-degree perturbation would be noticeable. (Alternatively, you can examine the effect of changing a distance by a couple of feet.) It's true that in this example the tight control provided by numerous loops and short traverses is responsible for this sensitivity. Similar results would have been achieved with almost any variance function (e.g., weighting all vectors equally). When so few traverses have more than one vector, location estimates won't vary much either -- perhaps a few inches one way or another.

Yet, when traverses are longer, it makes even less theoretical difference which formula we use to weight individual vectors. Unless our model is so sophisticated that it treats instrument calibration error as a random variable, the variance of a traverse is simply the sum of the variances, or covariance matrices, of the component vectors. (The assumption here is that the "random" measurement errors in different survey shots are reasonably small and independent of each other.) Consequently, as vector counts increase, the relative variances of traverses tend to resemble their relative lengths provided the distribution of shot lengths doesn't vary too much between traverses. Also, as directions change in a traverse, any asymmetry in the directional components of variance tends to be washed out.

It is often the case, unfortunately, that the longest traverses end up having the worst statistics, or F-ratios. While this might suggest to some that our model needs refinement, there's a much simpler explanation: Such traverses really are bad. There are at least two good reasons why we can expect this. First, data screening is less effective in isolating blunders to specific vectors in long traverses; serious mistakes will go unnoticed. Second, surveys of long routes are often carried out by single teams and are more likely to be affected by systematic error. (See Error Propagation in Long Traverses.) When our goal is data screening, we want to use a statistical model to expose such problems by way of contrast, not to simulate them. At some later stage of data processing, when we want up-to-date location estimates, then it's reasonable to assign a very large variance to a traverse we consider suspect. In Walls this is easily done with a variance override. Short of throwing the traverse out completely (or floating it), I know of no good way to handle such an assignment automatically. In fact, most of the large Walls databases I'm aware of have several traverses that have been permanently floated.

The correlated component model still has some advantages worth pointing out. Perhaps its most important advantage is flexibility. Transit surveys (i.e., turned angles) could be integrated with compass and tape data while giving proper statistical weight to the angle measurements. It would be possible to handle triangulations and missing measurements in a more satisfying way. The breakdown of the UVE into components besides horizontal and vertical might provide interesting information about specific kinds of measurements. As already mentioned , instrument calibration error could be treated as a random variable and taken into account when deriving covariance matrices for traverses. In fact, instrument corrections could in some cases be treated as unknowns and estimated. (This can now be done iteratively with successive compilations.) Because much of the code has already been written, a future version of Walls might offer such a model as an option.

However, it's unlikely that a cave survey would benefit in any practical sense from such a detailed model for an individual vector's variance, where we're presumably squeezing a little more information from the data set. As suggested above, the improvement in results owing to the mere presence of loops, or redundancy, in our cave surveys is likely to far outweigh in significance any benefit that could be derived from revising the statistical model. Implementing alternative models in Walls, as attractive as they might be theoretically, would surely break the most important rule of software design: Keep it as simple as practical considerations allow.