[Excerpt from An Object-Oriented Approach to Data Exchange Applications: Development of a Class LIbrary for the Spatial Data Transfer Standard. A masters thesis done by Phyllis Altheide, University of Missouri at Rolla, in Dec 1992. Full text available at ftp sdts.er.usgs.gov, directory pub/sdts/thesis.] III. THE SPATIAL DATA TRANSFER STANDARD A. PROBLEM DOMAIN: DATA TRANSFER STANDARDS The problem domain of data transfer standards was chosen to test the applicability of object-oriented techniques. The problem domain is more a category of problem domains with each specific standard posing as a specific problem domain. There are two types of data transfer standards: general purpose and application- specific. General purpose standards will work in a variety of cases and provide a means of transfer without defining the content of a transfer. The need for general purpose standards was necessitated by the wide variety of computers, media, and software that came into use. General purpose standards were not always adequate to handle data exchange within an application area. It is not good enough to be able to move an engineering drawing as a graphic from one system to another. There is no agreement on the content of the drawing, and there is a vast amount of critical auxiliary information (or metadata) that does not get transferred: bill of materials, stress factors, equations of surfaces, etc. Neither is it sufficient to transfer a word processing document as simple text. The effect of formatting is preserved (indents are now spaces), but all of the behind the scenes formatting codes are lost: indents, tabs, underlines, bold, page breaks, page numbering, etc. To deal with these problems, application-specific standards have been developed for many areas: word processing, engineering drawing [12], architectural drawing, geography, cartography [14], etc. An application-specific standard has a narrower focus, intending to deal with problems specific to an application area. These standards define a means of transfer and also define the contents (required and optional) of the transfer. The problem domain of data transfer standards is interesting in the fact that it exists because of automation. The majority of object-oriented work has been on problems with real-world foundations: home heating control, traffic management, optical experiments [2], manufacturing simulations, tactical systems [18], and games [3] to name a few. In the realm of automation, work has been done on scientific visualization [18], graphical user interfaces [1,18], branch path analyzer [18], and data structures [1,3]. The problem of data transfer exists solely within the realm of computer automation, with no real-world equivalent. The question is "What do object-oriented techniques have to offer to this application domain?" To explore this, an application-specific standard for geographic and cartographic applications, Spatial Data Transfer Standard (SDTS), was chosen. Before describing this standard in detail, a general description of a data transfer standard is given. B. LEVELS WITHIN A DATA TRANSFER STANDARD Every data transfer standard covers three levels of abstraction: conceptual, logical and format. Some standards documents are organized along these lines explicitly, while others are implied. All three levels are present, though. The conceptual level or philosophical background is the highest level of abstraction. The logical level or structures bring organization to the concepts. The format or physical or implementation level is at the lowest level of abstraction and deals with files of bytes. Any standard progresses from a discussion of ideas, through organizing structures, and finally to file formats. The following describes each of the three levels in more detail and illustrates using the example of a word processing standard (because it is a common ground of understanding across many disciplines.) 1. Conceptual Level. Every field of endeavor has theoretical and philosophical concepts that form the foundation of its existence. Vocabulary differences often exist within many disciplines, where different terms share similar meanings and(or) the same term has taken on many meanings. A standard must define and carve out a portion of the foundation with which it will deal, and reduce ambiguous vocabulary. The purpose of the conceptual level is to delimit the scope of the standard and define the problem domain vocabulary. As an example of delimiting scope, consider a hypothetical standard for the exchange of word processing documents. A document is a body of text, with pages and paragraphs. Pages have page numbers. Paragraphs have words which have font, bold, underline. Not within the scope is the concept of figures, book chapters, section headings, table of contents, and many more things. The terms used here all represent ideas. Nothing is stated about the representation (logical structures), or the physical encoding. 2. Logical Level. Concepts must be given form to progress towards the end goal of achieving data exchange. A standard must define organizational structures that embody the concepts. Often there are elemental structures and aggregate structures that represent allowed groupings. There are required elements and optional elements, and possibly even conditional elements. The purpose of the logical level is to define the structures, aggregations, and rules of usage. Continuing with the hypothetical document exchange standard, consider the structures for the concept of the paragraph. A paragraph is an aggregate of word elements and format elements. A word element starts with a code of "W", followed by two integer digits conveying length of the word, followed by the text of the word. As an example of a usage rule, a paragraph may not end with a format element. The logical structures are discussed without regard for records and files, the forte of the format level. 3. Format Level. The logical structures must be mapped into records and files to achieve the final result of enabling transfer. A standard must eventually address the file level. After all, the files get physically transferred, with the concepts and structures of the standard giving meaning to the exchange. The purpose of the format level is to describe how the organizational structures will be packaged into files, and define any additional file descriptive information. For the document exchange standard, the file format might be that the first six bytes of a file contain a time and date stamp. The word and format elements must start on a 4-byte address boundary, with unused bytes filled with null. Paragraphs are separated by the ASCII unit terminator. Depending on the complexity of the logical level, some standards do not make a great distinction between the logical and format level. They may be so intertwined that they are treated as one level, although both are present. All data exchange standards universally address the conceptual, logical, and format abstraction or data modeling levels. The predictable resemblance stops there. The specifics at each level may be vastly different (with standards for the same application area more similar than others.) Some standards even have multiple format levels (for example Computer Graphics Metafile [6]). The Spatial Data Transfer Standard is described specifically with regard to these three levels, emphasizing the portions needed to comprehend the object-oriented work. C. THE SPATIAL DATA TRANSFER STANDARD The Spatial Data Transfer Standard (SDTS) is an application- specific exchange standard for cartography and geology. The many separate developments of digital representations for maps has created an exchange problem due to vast differences in data models. The growing use and popularity of digital maps, as opposed to paper-based or hard copy maps, necessitated the need for a standard. The SDTS defines a general and flexible spatial data model to accommodate a wide range of models. Only the aspects of the SDTS needed as background information are described. 1. Conceptual Level. With regard to spatial properties, the SDTS supports both raster and vector-based spatial data models. Within raster, both continuous tone and classified are supported. A continuous tone raster has a photographic look like a satellite image or aerial photography. A classified raster has a property assigned to each pixel, thus classifying its contents. For example, in a soils classification map, each pixel is assigned a value representing the primary soil type within its parcel of land. A vector dataset is best described as line work. Within vector, topological, non-topological, geometrical, and combined topological and geometrical are supported. The concept of topology is that relationships between spatial elements are represented which allow graph theoretical algorithmic processing. A node appears at all line intersections; adjacent polygons share edges (not duplicated within dataset); and a surface is completely exhausted by non-overlapping polygons. The concept of geometrical is that lines have their endpoints and shape defined by coordinate pairs. Spatial object terminology varies widely in the realm of digital map representation, with practically every model having its own unique vocabulary. To overcome this barrier to exchange, the SDTS contains a set of spatial object terms and definitions. Spatial objects are defined with the properties of geometry and topology, for 0-, 1-, and 2-dimensions, and as elemental or aggregates [21]. Following are a few of the 1-dimensional spatial object definitions from the standard. (These objects are used in the implementation work discussed later.) Topological Link - A topological connection between two nodes. A link may be directed by ordering its nodes. Geometrical String - An ordered sequence of points representing a connected nonbranching sequence of line segments. Both Chain - A directed nonbranching sequence of nonintersecting line segments and(or) arcs, bounded by nodes, not necessarily distinct. The unique terms of a model would be equated to these standard terms to overcome the vocabulary barrier. The spatial objects are used to create digital representations of real-world features: a chain to represent a stream; a polygon to represent a lake; a point to represent a spring; etc. The shape and location of a feature is portrayed by spatial objects, but the description of a feature is encoded with attributes. Attributes describe the feature that the spatial objects represent. Feature types and attribute terms are another major barrier to data exchange - there is little agreement about terms and definitions. (This is mainly due to the fact that the intended use of a map determines what features and attributes are of interest: an urban planner would want commercial, industrial, residential zones; a hiker would want trails with slope; a tourist bureau wants major roads and attractions; ad infinitum.) As an attempt to overcome this barrier, the standard contains a partial list of feature and attribute terms with definitions [21]. Data producers are encouraged to equate their terms to these standard terms to facilitate data exchange. Realizing that there will be other non-standard terms used, the standard requires that user-supplied definitions be placed in a data dictionary. The concepts of spatial objects and attributes form the core of a digital map. However, there is other data about this data that may be required for a particular map application. This auxiliary data is often called metadata. The SDTS includes many kinds of metadata: data quality, coordinate reference system, symbolization, dataset description, security, etc. All of these represent important concepts of digital cartography, but will not be discussed further here as they are not within the scope of this object-oriented project. The concepts discussed above define the scope of the SDTS as a data exchange standard for cartography and geology. All of these ideas have been discussed without regard for representation in a dataset. Next the structures used to embody these concepts will be covered. 2. Logical Level. a. Module. The "module" is the primary organizational structure at the SDTS logical level. A module is a grouping of information sharing a common conceptual background. The standard defines 34 module types that cover the categories of spatial objects, attributes, reference systems, data quality, graphic representation, and other global information (i.e. metadata.) The mapping of concepts to modules is not one-to-one. In some cases many similar conceptual items are accommodated by a single module: all one-dimensional spatial objects are handled by the Line Module. In other cases a single concept requires several modules: the description of reference systems requires the module types of Internal Spatial Reference, External Spatial Reference, and Registration. All module types are defined by the standard, except the attribute modules. The specification for these must be completed by the user. The attribute modules are based on the relational model. The description of the attribute modules is encoded as schema information: the transfer contains modules which describe the user-specified modules. The other modules which are specified by the standard do not need this included description: the standard is the source of the description. The use of standard defined modules to describe user-defined modules supports the concept of being a self-contained data transfer. The module structure is an aggregate of other structures, which is best described as a hierarchy of is-composed-of relationships. A module is composed of module records which are composed of module fields which are composed of module subfields. At the highest level, a transfer is composed of modules. b. Module Field. Module field types are not unique to a module, certain types are used by many modules. To avoid the problem of identifying the module type from the field composition of a module record, a set of fields were designated as primary fields. A specific primary field type can only occur in a single type of module. The secondary field types can be used by many modules. A foreign id module field is used by many modules and is an important mechanism of the logical level. Each module record instance is uniquely identified by the contents of a pair of subfields: module name and record id. The foreign id field allows one module record to reference another module record (or a set if wildcards are allowed.) Foreign ids are used by spatial object records to reference attributes, to express topological relationships, and to express is-composed-of relationships, to name an important few. c. Module Record. The module records within a module do not have to all be of the same layout. Fields may be excluded or even repeated. Given that field types are not unique to module type and record layouts are variable, field composition alone is not enough information to determine the module type of a module record. One of the module fields of each type of module is designated as the primary field. The primary field unambiguously determines the module type to which the containing module record belongs. All the other fields are designated as secondary fields. A module record will always contain the minimum of its primary field. d. Subfield. The data to be transferred resides in the subfields, with the higher level structures providing grouping, ordering and semantic context. The fact that the subfield belongs to a Line Module record gives it context. The fact that it occurs in a certain group and in a specific place in an order completes its context. As important as the actual data value is, without its context it is meaningless. With subfields holding data, the issues of allowable data types and domains becomes important. The standard specifies a data type or choice for each subfield. For example, spatial addresses (coordinates) can be integers, reals, or binary values. Record ids must be non-zero, positive integers. Reference System Name subfield is restricted to a set of enumerated values. e. Rules of Usage. Besides defining the structures, the SDTS also states rules for their usage. Modules are required, optional, or conditional. The Identification module is required for all transfers. The Spatial Domain module is optional. The Definition module is conditional based on the use of non-standard feature and attribute terms. Additionally, modules are repeatable, i.e. a transfer may contain many modules of the same type. Module fields are also required, optional, or conditional, and repeatable or not. The rules are nested in context--a required module field does not make the containing module type required. The primary field of each module is required and not repeatable. The secondary fields vary. For the Line module, the Attribute field is optional, and the Polygon Left field is conditional on whether or not topology is being encoded. The Attribute field is repeatable within a single module record, with no required order. The Spatial Address field is repeatable and the order represents the construction of a line in terms of vertices. Subfields are required, optional, conditional, and generally not repeatable within a containing field (two exceptions: raster cell values and point labels.) The rules on subfields must be taken in context of the containing module field: a required subfield does not make the field required. Conditional subfields are conditional on the values of other subfields in the same module record. There are no cases of cross-record conditions. 3. Format Level. The logical constructs of the SDTS are transformed into file constructs defining standard files. The operating system level files are what actually gets transferred. Instead of defining a unique and custom-tailored format level, the SDTS has taken a novel approach. As an application-specific standard, it uses a more general purpose standard to provide the format level. The SDTS uses the "Data Descriptive File for Information Interchange" standard, referred to as ISO 8211 [7]. The SDTS used an existing standard so as not to reinvent the wheel, and to place the transfer in a broader community of interchange. Specifically, the ISO 8211 is appealing because its mechanism encodes both data and description, which fits nicely with the SDTS design principal of promoting self-contained transfers. The SDTS and ISO 8211 are two independent standards. The SDTS chose to use ISO 8211 to define its format level, but it was not required. ISO 8211 works for many types of data transfer, not just the SDTS transfers. The ISO 8211 defines a general mechanism for the transfer of data and its description--the transfer content and semantics are entirely user-defined. The SDTS, as a user of ISO 8211, defines the content and meaning of a transfer while using the ISO 8211 mechanism to actually accomplish the transfer, i.e. to create files. Without explaining the ISO 8211 standard in detail, SDTS constructs are related to ISO 8211 constructs. (In regards to ISO 8211, the conceptual level is used. As a data exchange standard, ISO 8211 also has three model levels.) The SDTS equates a module field structure to an ISO 8211 An ISO 8211 file may contain part of, one, or many modules. An ISO 8211 record may contain part of, one, or many module records. The user of SDTS has many options at the format level on how to package information into files. Of course, module records must be recoverable from ISO 8211 files, and modules that are packed into the same file should share some meaningful (determined by user judgement) connection. The organization and method of the SDTS standards document would allow for the addition of alternate format levels, or the replacement of the existing with little effect on the logical level and no effect on the conceptual level.