4.2 Types of data
Acquisition of process data constitutes the most basic activity of process monitoring. Simultaneously, this is the first stage in the complex operation of knowledge-based supervision. The amount and quality of input data constitute a decisive factor for further possibility of getting useful knowledge and making correct decisions. Therefore, data acquisition should be normally completed with pre-processing stage, oriented towards generation of right amount of quality data from the rough input. The typical operations may include filtering, selection, fusion, correction, etc.
Note that the data coming from process to monitoring and supervision
unit may be used in at least three different ways, i.e.:
Depending on the intended use different requirements may be imposed
on the pre-processing stage. For example, in case of the direct use by
human operator, significant reduction of the data size may be necessary,
so as to avoid the so-called ''cognitive overload''. Further, specific
user-friendly readable and transparent form for information displaying
would probably be highly desired. Some numerical data abstraction is another
typical operation useful for gaining comprehensibility: it is know that
people best differentiate from 2 to 9 levels of signals, while seven seems
to be frequently used (for example, NB - negative big, NM -
negative medium, NS - negative small, Z - around zero, PS
- positive small, PM - positive medium, PB - positive big).
Before we define basic problems concerning information acquisition and
the resulting quality of the data, let us briefly summarise the types of
data which can be considered as input for the supervisory system:
Despite the general characterisation of possible types of data presented
above, for simplicity it is often the case that the term qualitative
data refers data ordered in certain sense (here: ordered qualitative
data) while all the other unordered data are referred to as symbolic.
For simplicity, this convention will be kept mostly throughout this book.
Note that for quantitative data it is normally possible to apply typical arithmetical operations, such as summation, subtraction, etc. Further, the idea of distance is well defined, i.e. for any two data items a nonnegative real number characterising the objective difference among these elements can be calculated.
Moreover, note that practically any particular type of data presented above (maybe apart the last, special ones) can be used to form more complex structures, like vectors (single dimensional arrays of determined length; reference to an element is through its position), lists (linear sequence of elements of unlimited length; due to its recursive structure, only first element of a list (the so-called head of the list) can be referenced directly, reference to the other elements is performed in a recursive way through removing subsequent heads every time leaving the rest of the list (its tail), arrays (tables, matrices) (two or more dimensional tabular representation forms) and other record-like structures or frames (practically any possible structures composed of fields connected into a more complex form; any field has usually at least name, type, and value, but may have also some particular properties (e.g. default value) or specialised procedures assigned to it).
Depending on the type of input knowledge, specific problems may occur. Further, any type of data require specific pre-processing. It is important to realise that data acquired during process supervision are not free of errors and specific problems. Further, it is important to distinguish and comprehend these problems. Only if this is the case detection and identification of problem concerning specific data can be undertaken. And finally, when specific problems are identified, further decision concerning possible corrections, updating and use can be made.
In order to be more precise when speaking about data, data quality and finally knowledge generation and representation, some model of data representation should be accepted for the sake of making the discussion well-founded. In further part some simple, most commonly used approach based on attributes and relational databases is recalled, logic, graphs, fuzzy sets, etc. are outlined.
As the numeric, symbolic and qualitative data seem to be most common
types of input data, specific attention will be put to building a common
representation model. Basing on this model specific problems of data acquisition
and pre-processing will be discussed.