Model evaluation

11.1 Background

Model evaluation is defined as the systematic gathering and promulgation of information about models in order to determine model limitations and domains of applicability. Model evaluation should be viewed as a process rather than a specific result.

The intent of this chapter is not to furnish a compendium of evaluation results but rather to describe the evaluation process itself. Such a description can provide useful insights into the benefits and shortcomings of model evaluation. The fundamental steps involved in the evaluation process will be described and only a select number of examples will be used to further illustrate the process.

At first glance, the process of model evaluation might appear to be very straightforward. When examined more closely, however, a number of factors are seen to complicate and frustrate the process. To be credible, the evaluations must be based on statistically comprehensive comparisons with benchmark data. Such benchmark data can include field measurements, outputs from other models or closed-form analytical solutions. Each of these data sources has limitations. Field measurements are quite limited with regard to temporal, spatial and spectral coverage. Furthermore, these data sets are sometimes classified on the basis of national security concerns. Outputs from other models are useful only when those models themselves have been properly evaluated. Analytical solutions can provide very accurate, albeit idealistic, comparisons for a relatively small number of useful problems.

Model development is an evolving process. Any particular version of a model quickly becomes outdated and the attendant evaluation results become obsolete. Therefore, comparisons of model accuracy are valid only for the particular model version tested. To ignore progress in model developments gained through previous evaluation experiences is to do an injustice to those models. Accordingly, only the most recent and authoritative evaluation results should be considered when assessing models.

Rivalries among different governmental, academic and industrial laboratories frequently add a political dimension to the model evaluation process.

Many of the well-known models have gained their notoriety, in part, through the zealous advocacy of their developers and sponsors.

Clearly what is needed then is an autonomous clearinghouse to provide for both the unbiased evaluation of models and the timely promulgation of the results. An intermediate step in this direction was taken by the naval sonar modeling community through the establishment of configuration management procedures. These procedures provided a disciplined method for controlling model changes and for distributing the models (together with selected test cases) to qualified users. Configuration management comprises four major activities: (1) configuration identification and use of a product baseline; (2) configuration change control; (3) status accounting and documentation of all product changes; and (4) reviews, audits and inspections to promote access to information and decision-making throughout the software life cycle.

The US Chief of Naval Operations established the oceanographic and atmospheric master (OAML) in 1984. The OAML is chartered to provide fleet users with standard models and databases while ensuring consistent, commonly based environmental service products (Willis, 1992).

11.2 Past evaluation efforts

Comprehensive model evaluation efforts in the past have been very limited. Notable efforts have been sponsored by the US Navy, but these efforts have focused largely on propagation models. None of these efforts is presently active and their results are therefore only of limited academic interest. All of these efforts were successful in addressing the immediate concerns at the time, but were unsuccessful in the longer term due to the evolutionary nature of model developments and the lack of widely accepted evaluation criteria. It is instructive, nonetheless, to review these past efforts.

The model evaluation program (MEP) was administered by the (now defunct) acoustic environmental support detachment (AESD) under Office of Naval Research (ONR) sponsorship. Emphasis was placed on model-to-model comparisons and the evaluation results were not widely disseminated.

A methodology for comparing acoustic-propagation models against both other models and measured data was developed by the panel on sonar system models (POSSM). POSSM was administered by the Naval Underwater Systems Center (NUSC) and was sponsored by the US Naval Sea Systems Command (Lauer and Sussman, 1976, 1979). Some of these results will be discussed later in this chapter. Among the many observations made by POSSM was the lack of documentation standards for acoustic models (Lauer, 1979).

In an effort to promote standardization, the US Navy established the acoustic model evaluation committee (AMEC) in 1987. The specific charter of AMEC was to develop a management structure and administrative procedures for the evaluation of acoustic models of propagation, noise and reverberation, although initial attention was restricted to propagation models. Specific evaluation factors included model accuracy, running time, core storage, ease of effecting slight program alterations and available ancillary information. The activities of AMEC culminated in evaluation reports for two propagation models: FACT and RAYMODE. As a result of AMEC's activities, RAYMODE was designated the (interim) Navy standard model for predicting acoustic propagation in the ocean. Subsequently, AMEC was disbanded.

More recent publications have proposed various theoretical and practical procedures for model evaluation (McGirr, 1979, 1980; Hawker, 1980; Pedersen and McGirr, 1982). Additional evaluation studies have been described for both propagation models (Hanna, 1976; Eller and Venne, 1981; Hanna and Rost, 1981) and reverberation models (Eller et al., 1982). Spofford (1973b), Davis et al. (1982) and Chin-Bing et al. (1993b) reported practical test cases suitable for all types of propagation models. In May 1994, the ONR conducted a reverberation workshop in Gulfport, Mississippi (USA). The purpose of this workshop was to assess the fidelity of low-frequency (30-400 Hz) scattering and reverberation models. Six test cases comprising either field-measurement data sets or analytic problems with known reference solutions were used to establish the accuracy and state of current reverberation model development. Uniform plot standards facilitated intercomparison of test-case results. The results of this workshop are to be published in a proceedings volume.

11.3 Analytical benchmark solutions

In the absence of comprehensive experimental data, underwater acousticians have explored the use of analytical benchmark solutions to assess the quality of numerical models (Felsen, 1986; Jensen, 1986; Jensen and Ferla, 1988; Robertson, 1989). Such benchmarks emphasize idealized but exactly solvable problems. In the case of propagation models, two benchmark problems have been investigated in detail: (1) upslope propagation in a wedge-shaped channel and (2) propagation in a plane-parallel waveguide with range-dependent sound-speed profile (Felsen, 1987). These two problems are illustrated in Figures 11.1 and 11.2, respectively (Jensen and Ferla, 1988, 1990).

A special series of papers coordinated by the Acoustical Society of America described the results of a concerted effort to apply standard benchmark problems (with closed-form solutions) to the evaluation of propagation models. The propagation models selected for evaluation were based on different formulations of the wave equation (Buckingham and Tolstoy, 1990; Collins, 1990b; Felsen, 1990; Jensen and Ferla, 1990; Stephen, 1990; Thomson, 1990; Thomson et al., 1990; Westwood, 1990). The results of this evaluation provided several important insights (Jensen and

Parameters common to all three cases Wedge angle: 0O = arctan0.05 ^ 2.86° Frequency: f = 25 Hz

Isovelocity sound speed in water column: c1 = 1,500 ms-1

Source depth: 100 m

Source range from the wedge apex: 4 km

Water depth at source position: 200 m

Pressure-release surface

Case 1: Pressure-release bottom. This problem should be done for a line source parallel to the apex, i.e. 2D geometry. Case 2: Penetrable bottom with zero loss.

Sound speed in the bottom: c2 = 1,700 ms-1 Density ratio: p2/p1 = 1.5 Bottom attenuation: 0dB X-1

This problem should be done for a point source in cylindrical geometry. Case 3: Penetrable lossy bottom. As in Case 2 except with a bottom loss of 0.5 dB x-1

Receiver depths Case 1: 30 m

Cases 2 and 3: 30m and 150m

Free surface

Free surface

Figure 11.1 Analytical benchmark problem: wedge geometry for cases 1-3 (Jensen and Ferla, 1988).

Ferla,1990):

1 One-way (versus two-way) wave equation solutions do not provide accurate results for propagation over sloping bottoms.

2 The COUPLE model provides a full-spectrum, two-way solution of the elliptic-wave equation based on stepwise-coupled normal modes. This code is ideally suited for providing benchmark results in general range-dependent ocean environments. However, the solution technique is computationally intensive and is therefore impractical at higher frequencies.

3 IFDPE provides a limited-spectrum, one-way solution of the parabolic approximation to the full wave equation. The implicit finite-difference solution technique is computationally efficient, and accurate oneway results are provided for energy propagating within ±40° of the horizontal axis.

4 PAREQ provides a narrow-angle, one-way solution of the parabolic approximation to the full wave equation. The split-step Fourier solution technique is computationally efficient, and accurate one-way results are provided for energy propagating within ±20° of the horizontal axis.

The term 'pathological' is sometimes used in reference to those test cases that prove to be particularly troublesome to models. In the evaluation of propagation models, for example, ocean environments exhibiting double sound-speed channels near the surface make especially good pathological test cases. Ainslie and Harrison (1990) proposed using simple analytic algorithms as diagnostic tools for identifying model pathologies. They developed simple analytic expressions for computing the intensity contributions from standard

2n I1

Was this article helpful?

0 0

Post a comment