1

4.2 Types of data

Data acquisition and use

Acquisition of process data constitutes the most basic activity of process monitoring. Simultaneously, this is the first stage in the complex operation of knowledge-based supervision. The amount and quality of input data constitute a decisive factor for further possibility of getting useful knowledge and making correct decisions. Therefore, data acquisition should be normally completed with pre-processing stage, oriented towards generation of right amount of quality data from the rough input. The typical operations may include filtering, selection, fusion, correction, etc.

Note that the data coming from process to monitoring and supervision unit may be used in at least three different ways, i.e.:

after initial pre-processing and possibly further transformation, it can be directly used by human staff operator to supervise and make decision in practically real-time,
after initial pre-processing, the data can be further processed to the form acceptable by knowledge-based system performing the supervisory function; so it can be used for immediate knowledge processing, e.g. for inference with a rule-based supervision system,
after initial pre-processing, the data can be stored for further analysis or just as the trace of the process (as a ''black box'') for the sake of tracing back causes in case of possible damage, etc.

Depending on the intended use different requirements may be imposed on the pre-processing stage. For example, in case of the direct use by human operator, significant reduction of the data size may be necessary, so as to avoid the so-called ''cognitive overload''. Further, specific user-friendly readable and transparent form for information displaying would probably be highly desired. Some numerical data abstraction is another typical operation useful for gaining comprehensibility: it is know that people best differentiate from 2 to 9 levels of signals, while seven seems to be frequently used (for example, NB - negative big, NM - negative medium, NS - negative small, Z - around zero, PS - positive small, PM - positive medium, PB - positive big).

Types of data

Before we define basic problems concerning information acquisition and the resulting quality of the data, let us briefly summarise the types of data which can be considered as input for the supervisory system:

Quantitative data; these are data obtained somehow according to counting or measurement. The following subtypes are possible:

real numeric data: the measured values of certain system variables important from the user point of view; this kind of data constitutes the most typical type among all inputs. This are most frequently measured data represented by real numbers (and therefore always subject measurement error, i.e. imprecise).
integer numeric data: represented with integer numbers, which can be ''totally precise'', obtained by counting object, elements, repetitions of events, etc.
binary data: this are values of two-level signals, usually denote with 0 and 1; most typical in digital circuits; in systems having logical interpretation, these kind of values are also denoted as {true, false} or {T, F}.

Qualitative data, (also termed descriptive), i.e. ones impossible or impractical to measure or count in a direct way (e.g. due to their complexity) referring to properties characterised with linguistic expressions. Here one can further distinguish:

ordered qualitative data: this kind of data represent certain variable values in a ''rough'', qualitative way; typically, the sensor output data can take several, symbolically encoded values, usually ordered in a linear way and separated from each other. Usually, if this data is a direct result of measurement, this is achieved by specific construction of the sensor, in which subsensors responsible for detecting certain level produce the combined output signal. This kind of data can come for example from a multiple-level rely, or other, divided to separate zones sensing instrument (temperature, pressure, flow, etc.).
symbolic data: this kind of data takes the form of linguistic labels intended to encode certain characteristic features of the process components. The most typical case consists in encoding values of certain attributes (e.g. colour=red). This kind of data can come from sensors devoted to activate while the certain attribute value occurs or be the result of pre-processing data coming from more complex sensors, including image sensors. Another source can be just human operator.

Specific, application-oriented data; this are specialised forms of data usually requiring domain-specific tools and processing methods. Here one can distinguish:

image: this kind of data is a specific, very complex numerical data and comes from video (usually CCD) cameras. It deserves special treatment since both pre-processing and further analysis require quite specific tools.
sound: in fact this is a continuous real signal; however, since normally specific methods are required, again it deserves separate treatment,
knowledge as data: in certain specific cases the input ''data'' may take form of knowledge, e.g. facts, relations among elements, logical formulae or semantic graphs. This may be the case of more complex supervisory systems monitoring complex systems equipped with their own knowledge-bases, e.g. applied for control, and generating and using symbolically represented knowledge itself.

Despite the general characterisation of possible types of data presented above, for simplicity it is often the case that the term qualitative data refers data ordered in certain sense (here: ordered qualitative data) while all the other unordered data are referred to as symbolic. For simplicity, this convention will be kept mostly throughout this book.

Note that for quantitative data it is normally possible to apply typical arithmetical operations, such as summation, subtraction, etc. Further, the idea of distance is well defined, i.e. for any two data items a nonnegative real number characterising the objective difference among these elements can be calculated.

Moreover, note that practically any particular type of data presented above (maybe apart the last, special ones) can be used to form more complex structures, like vectors (single dimensional arrays of determined length; reference to an element is through its position), lists (linear sequence of elements of unlimited length; due to its recursive structure, only first element of a list (the so-called head of the list) can be referenced directly, reference to the other elements is performed in a recursive way through removing subsequent heads every time leaving the rest of the list (its tail), arrays (tables, matrices) (two or more dimensional tabular representation forms) and other record-like structures or frames (practically any possible structures composed of fields connected into a more complex form; any field has usually at least name, type, and value, but may have also some particular properties (e.g. default value) or specialised procedures assigned to it).

Depending on the type of input knowledge, specific problems may occur. Further, any type of data require specific pre-processing. It is important to realise that data acquired during process supervision are not free of errors and specific problems. Further, it is important to distinguish and comprehend these problems. Only if this is the case detection and identification of problem concerning specific data can be undertaken. And finally, when specific problems are identified, further decision concerning possible corrections, updating and use can be made.

In order to be more precise when speaking about data, data quality and finally knowledge generation and representation, some model of data representation should be accepted for the sake of making the discussion well-founded. In further part some simple, most commonly used approach based on attributes and relational databases is recalled, logic, graphs, fuzzy sets, etc. are outlined.

As the numeric, symbolic and qualitative data seem to be most common types of input data, specific attention will be put to building a common representation model. Basing on this model specific problems of data acquisition and pre-processing will be discussed.