Variance Assignments

A variance assignment is an optional parenthetical expression following the last numeric item on compass and tape (CT) and rectangular coordinate (RECT) data lines. (See Vector Type.) It can also appear after the coordinates in a #FIX directive. Its purpose is to override the default variance (inverse weight) that the program assigns to that vector during adjustment and error analysis.

By default, the program converts a CT vector to its east-north-up components and assigns to each component a variance of <total length in feet>/100 ft², the same default used by the mainframe program Ellipse, the precursor of Walls. This is equivalent to a standard deviation of 0.3 m for a 30-meter shot. The variance of a traverse is simply the sum of the variances of its component vectors regardless of vector type.

Unlike CT vector components, the components of RECT vectors and #Fix points are assigned default variances of zero. This means they won't be changed by a least squares adjustment. Most likely you'll want to override the default as described below, particularly for #Fix points representing GPS locations.

Our experience in working with several large project data sets over the years suggests that the default variance assignment results in horizontal and vertical unit variance estimates (UVEs) in the range 0.2 to 2 for typical cave surveys in which all evident blunders have been corrected or discarded. You might experience a different, hopefully narrower range, but adopting the default makes interpreting the statistics easier and allows us to objectively compare different surveys and loop systems. In fact, if surveyors were willing to annotate their maps with a consistency rating like "UveH=1.20(14), UveV=0.55(15)" alongside the usual instrument details, we would have enough information in some cases to obtain rough "confidence regions" for location estimates. The numbers in parentheses are the horizontal and vertical loop counts, which measure the significance of the UVEs and allow them to be properly combined with data from other surveys.

For a brief explanation of why we chose this simple length-proportional model for variance as opposed to one that incorporates specified standard deviations of actual measurements, see Choice of Mathematical Model. For the mathematical details of how these variances are used in Walls, see Statistical Formulas.

The variance override, if present, follows the last coordinate or measurement of a line defining a vector. It is a parenthetical expression, with no embedded tabs or spaces, that has the following form:

(h,v)

Examples: (1000,2000), (R5,R10), (r5,*), (1000000,?), etc.

Parts h and v are each either an asterisk (*), a question mark (?), a nonnegative number, or a number prefixed with the letter R. The meanings of these different forms are explained in the sections below. The h part determines the variance Walls will assign to each of the two horizontal error components. The v part determines the vertical component's variance.

Alternative forms are (h,) and (,v) which specify that only one of the two kinds of component variance (horizontal or vertical) is being overridden. In other words, to retain the default variance ordinarily given to a component type, we would simply include the comma separator without an associated assignment. Another alternative form is (h), which means that expression h applies to the vertical component as well. For example, (*) is the same as (*,*).

When a non-prefixed number is used for v or h, it will be interpreted as a length override. In deriving a component's variance, Walls will treat the vector as if it were an ordinary compass-and-tape vector of the specified length. The distance units as specified in the #Units directive is applicable here as it is with the rest of the data line (or you can use an F or M suffix override).

The larger the variance the less weight the vector will be given when it is averaged with the entire data set. A special case worth noting is when zero variance is assigned. The override "(0)" insures that all components of the vector will be held fixed, or constrained, in the adjustment operation. Constrained vectors that are also traverses in loop systems will be identified as such by Walls in the Geometry page of the Review dialog, where the expression "<FIX>" replaces statistics that are not applicable in this case. If for some reason you inadvertently assemble a loop consisting entirely of constrained vectors, Walls will abort compilation and display an error message. (Walls is designed to do this even if the unadjusted loop closure is exactly zero.)

Since the error properties of vertical shot measurements (i.e., +90 or –90 inclinations) are different from those of normal shots, the program offers a convenient way to control whether or not the vertical and/or horizontal components of such shots are, by default, to be constrained. These options can be set for specific surveys or for entire branches of the project tree. A discussion of the rationale behind constraining either type of component can be found in the description of the Compile Options Page of the Properties dialog.

While numbers representing variances can be assigned, just as likely you'll be using a special override, an asterisk or question mark, to assign infinite variances to the horizontal and/or vertical components of specific vectors and traverses -- namely those shown to be bad or highly questionable by the data screening process. This operation, called floating, causes Walls to give zero weight to the corresponding measurements in the least-squares adjustment, effectively discarding them. If a question mark character (?) is used in place of h or v in the override expression, then just the vector is floated. If an asterisk (*) is used, then all vectors in the containing traverse are floated. During an adjustment, the entire discrepancy between a traverse and the remaining network (i.e., the best correction) is absorbed by the floated vectors. If the traverse has multiple floated vectors then the correction is proportioned out between them according to vector length.

Multiple members of a traverse chain can also be floated this way -- an operation that will distribute the best correction across several traverses. To do this it's necessary to find a vector in each of the selected chain members and apply an asterisk override to the appropriate component. As when floating normal traverses, suitable traverse chain vectors are most easily found and linked to from the Review dialogs. In fact, floating traverse chains interactively rather than with variance overrides in data is usually done as a first step. See Floating Traverse Chains.

Accommodating blunders is not the only possible justification for floating vector components. Suppose we want to define a vector that expresses only our knowledge that two stations A1 and B1 are at the same elevation, perhaps at a lake's surface. We could do this with the following CT data line:

Only the zero inclination measurement is relevant here since we are constraining the vertical component and floating the horizontal components. We are assuming that other parts of the survey are sufficient for computing the actual horizontal separation between A1 and B1. An analogous situation is a cave-to-surface radio location where only the horizontal, not vertical, separation is being measured. In this case we would float the vertical component but not the horizontal components. Also, since radio locations are subject to error, we probably wouldn't want to constrain the horizontal components to be zero. The vector definition for a radio location might be as above, but with a variance override something like (R2,?), where R2 specifies a 2-meter horizontal RMS error. (RMS errors are described below.)

Floating a vector is not quite the same thing as assigning a very large numerical value to the variance. While the final location estimates would be essentially the same, the unit variance estimates (UVEs) would be slightly larger due to the fact that floating reduces the loop count. (The loop count appears as a divisor in the formula for UVE.) With regard to the statistics, floating both horizontal and vertical components of a traverse is exactly the same as detaching it by renaming an end station. In the former case, however, the loop system's apparent geometry doesn't change; floated traverses are still adjusted to achieve consistency when the map is drawn.

In the case of blunders or bad closures it's usually preferable to float traverses interactively via the Geometry page of the review dialog rather than by "hard coding" them into the data files as described here. This way older data can easily be reevaluated as the survey is extended, and you are less likely to overlook unresolved problems. The advantage of hard coding is that you won't be facing the same distracting statistics each time you recompile your survey data. (Floated traverses are no longer "bad" and will appear near the bottom of the sorted list in the Geometry page.) Sometimes data screening can do no more than narrow a blunder's location to an obviously bad traverse, where the only reasonable course of action is to permanently float it, resurvey it, or throw it out completely.

Finally, if you acquire the habit of floating vectors in your data files, you may occasionally see a Walls compilation abort with the message, "Too many vectors are floated". This means there exist traverse end points for which there is insufficient (unfloated) data to obtain estimates for both their horizontal and vertical locations. Interactively this can't happen since you are prevented from floating traverses that have been turned into bridges. You can, however, float multiple traverse chain members -- an operation that would produce this error message in earlier versions of Walls.

Variance overrides are frequently used with #Fix directives, especially those defining GPS locations. For these cases it's convenient to use R-prefixed numbers for h and v. The number portion should be your estimate for the Root-Mean-Squared (RMS) error, which is the square root of the average squared position error in feet or meters. For the horizontal part of the override, RMS error can be interpreted as the radius of a 63% confidence region. For the vertical part, RMS error is equivalent to the standard deviation, which defines a 68% confidence interval.

For example, a variance override of (R4.5,R10) specifies that the probability is 0.63 that the 2-dimensional, horizontal error vector has length less than 4.5. It also specifies that the probability is 0.68 that the magnitude of the elevation error is less than 10. The default length units (feet or meters) are the same as for the coordinates themselves, but you can override the default by using an F or M suffix. If units are in feet, you could specify (R4.5F,R10F).

While RMS error has perhaps the simplest mathematical definition, there are other popular statistics for estimating accuracy. Converting one of those to a nearly equivalent RMS error is usually just a matter of multiplying by a constant. For instructions on how to do this, see GPS Position Accuracy Table. Although we're calling them variance assignments, RMS errors are in the same category as standard deviation. Walls will convert them to variances in a compilation.

Note that if you make an assignment like (R10), which is the same as (R10,R10), you won't be giving all three error components identical variances. The vertical component in this case would be given variance 10² = 100, while each horizontal component would be given half that variance, or 50. (You can confirm this by using the table: (0.71 x 10)² = 50). It so happens that this relationship between horizontal and vertical GPS accuracy is often observed in receivers capable of providing GPS-based elevations.

Given that many receivers display an Estimated Position Error (EPE), a statistic that takes into account satellite arrangement and other factors, why not support EPE values in variance overrides? The reason we don't is that manufacturers of consumer devices haven't documented how EPEs are calculated, or even what they mean exactly. Garmin International, for example, provides us with only this definition of EPE: "A measurement of horizontal position error in feet or meters based upon a variety of factors including DOP and satellite signal quality." Evidently this isn't meant to imply that EPE is an estimate of the expected error in a true statistical sense -- EPE would represent a 54% confidence region in that case. The EPEs of recent Garmin models appear to be more conservative than that, representing perhaps a 95% confidence region. In fact the statistical meaning of EPE hasn't been consistent across GPS receiver brands and models.

A note of caution regarding #Fix point variance overrides: Using unscaled RMS error estimates is recommended only if you're willing to accept that the default assignment for CT vectors is a reasonable approximation. That means you think UVEs of value one are at least achievable with your CT surveys. Otherwise, you may want to scale RMS error so that it reflects relative variance as described below.

The default variance assignment, <length in feet>/100 ft², was chosen simply to provide a reasonable magnitude for the UVE -- a range that includes the value one -- so that the latter can serve as a convenient measure of consistency. If we were to replace the denominator 100 with 1000, the adjustment results (station locations) would remain exactly the same as before, but the UVEs displayed in Walls dialogs would be ten times larger. In other words, it is only the relative sizes of variances assigned to vectors that will affect our final estimates and, for that matter, the F-ratios computed for traverses.

If you find that your surveying produces UVEs that are consistently smaller than one, then it is indeed the case that the default variances overestimate the true error variances of your observations. But this in itself doesn't undermine the basic assumption of our simplified model, which is only that variances are proportional to traverse length. In fact, the UVE itself can be regarded as an unbiased estimate of a variance scaling factor which we are in effect treating as an unknown model parameter. (See Statistical Formulas.) Thus, an estimate for a vector component's true error variance is simply UVE x <length>/100 ft². If we were to actually assign these scaled variances, a recompilation by Walls would produce a new UVE close to 1.0. But there would be little advantage to doing so, as our final UVE would then cease to have any meaning.

The default variance assigned to a #fix station is zero, which completely constrains the station to the coordinates you specify. Unless the station is isolated, however, this is rarely justified, especially with regard to the elevation measurement. A surveyed traverse between two GPS locations, underground or not, might in fact be a better estimate of relative location than what a consumer-grade GPS receiver could provide. If we expect to have multiple fixed stations, say GPS locations connected together by compass-and-tape (CT) surveys, then we'll need to provide Walls with estimates of their accuracy. This is a technicality we can't avoid if we want Walls to make good use of all our data when computing final station coordinates.

As described above, our methodology involves a default variance assignment that remains fixed for almost all of our CT surveys (whether or not it's realistic), and a resulting Walls-calculated variance scaling factor (UVE) that's likely to deviate from one. Realizing that what affects least-squares adjustment results are the relative weights assigned to vectors, we try to insure that all assigned variances reflect the ratios of the true variances of our vector observations -- at least as far as they're known. The numbers we place in parentheses, therefore, are not necessarily what we believe to be actual error variances, but instead represent relative variances. This applies not only to the occasional normal survey shot whose default variance we override, but also to the hidden vectors connecting #fix stations to the coordinate system's origin.

Suppose, then, that we already have an accuracy estimate for a #fix station -- perhaps an RMS error estimate, or the radius of a 95% confidence region. What should we enter as the variance override? Well, if we were consistently obtaining an overall UVE of 2.0 with our CT surveys, and we believed it was representative of the quality obtainable in a particular project, then we might conclude that vector measurement error is behaving as if variances were twice that of the Walls default assignment. The default assignment would be underestimating the true variances of our CT vectors. Therefore, to preserve ratios, we need to similarly underestimate the true variances of any #fix or RECT vectors that could become part of a loop system.

Since the accuracy of a GPS location will normally be expressed as an RMS error estimate, we want to scale it so that the variance Walls computes from it has the correct ratio to the true variance. Since RMS error is proportional to the standard deviation, we simply divide it by the square root of the UVE (horizontal or vertical as appropriate). This way we'll be underestimating (or overestimating) the variance by the proper amount. If we were instead starting with a 95% confidence region or similar statistic, we would use the GPS Position Accuracy Table to convert it to an RMS error before doing the division.

There is an alternative to scaling the GPS variance assignments to make them compatible, in a relative sense, to those of the CT default assignments. That is to scale the latter using UVH and UVV parameter values -- values that would produce expected UVE's close to one assuming no blunders. We would have to avoid applying this parameter to the GPS fixes, where we would be using the unscaled RMS values as variance assignments. With this methodology, our goal in a large ongoing project would be to maintain overall UVEs close to one. My preference, however, is to stay with the default assignments for CT data, making the UVEs useful comparing surveys in different projects. With experience it's possible to acquire a sense of how large they should be.

Using the same principle we could also express relative variance as a length value. You'll not need to do this, but we'll illustrate the method because it reveals an interesting relationship between GPS locations and CT traverses. Given that the default assignment for a CT vector is <total length in feet>/100 ft² for each vector component, we can use the table to derive the following approximate formula for the metric "length" value, h, in a horizontal component variance override:

h = 3.28 x 100 x STDEV² / UveH = 328 x (0.71 x RMS)² / UveH = 164 x RMS² / UveH (metric units)

(The multiplier 0.71 approximates 1 / Sqrt(2) and is applicable only in the 2-dimensional case. The length formula for the vertical component would be v = 328 x RMS² / UveV.) Now suppose we are combining a horizontal GPS position with CT surveys whose UveH of 2.0 we have accepted as realistic. If the GPS position has an estimated RMS error of 6 meters, we would use the following length override:

So assuming the GPS fix is part of a loop system -- in which case it must be connected by a CT traverse to at least one other #Fix point -- its displacement from the zero datum will be given the same weight in the adjustment as would be given (by default) to a 3 km-long CT traverse!

This might suggest that a 3 km traverse is as reliable as a typical GPS measurement taken at its end; however, length values this large can't be interpreted that way. As traverse lengths increase beyond a few hundred meters, the imperfect calibration of magnetic compasses against true north becomes a significant source of error -- at some point more significant than random measurement error. This problem, however, has implications for the weighting of long CT traverses, not #fix points. The topic Error Propagation in Long Traverses suggests a couple of strategies for dealing with it.