The Five Data Type Groups
Any data storage and processing regime must begin with a set of data types, in order to validate datum. The HDB has five groups of data types, which are represented by a set of C++ classes. There is a base class of hdbDatatype to cover all data types, then a hdbDatatype derivative for each group as follows:-
1) hdbCpptype. | C++ fundamental data types |
2) hdbHzotype. | HadronZoo in-built data types (e.g. hzEmaddr, hzUrl, hzIpaddr) |
3) hdbRgxtype. | Application specific data types validated by a regular expression |
4) hdbEnum. | Application specific data enumeration (validation list) |
5) hdbClass. | Application specific data class |
The phrase 'represented by' is used instead of 'implimented by' for a reason. Of the hdbDatatype base class and its derivatives, only hdbClass is functionally substantive. The rest are largely inert. Although some HadronZoo in-built data types require significant implementation, none of this is provided by hdbHzotype and the classes that do provide it are not in any way 'aware' of hdbHzotype! Instead, the principle objective of the hdbDatatype base class and its derivatives, is to facilitate and control database configuration. Under the HDB regime, data type names must be unique, not just within their group but across all data types. The arragement of classes allows data types to be treated as a single polymorphic group, which in turn means data type names can be mapped to data types by a single map.
The hdbBasetype Enum
hdbDatatype embues derivatives with the data type name, and a hdbBasetype enum value which either states the exact data type or indicates the data type group. The hdbBasetype enum, defined in hzDatabase.h, reflects the five data type groups but pinpoints the type within each group (where possible). In the actual hdbBasetype enum definition, values are given a prefix of 'BASETYPE_'. For the purpose of this document, this prefix is removed. The data types are thus as follows:-
Group 1: | Fundamental C++ types (fixed size) |
DOUBLE | 64 bit floating point value. |
INT64 | 64-bit Signed integer. |
INT32 | 32-bit Signed integer. |
INT16 | 16-bit Signed integer. |
BYTE | 8-bit Signed integer. |
UINT64 | 64-bit Positive integer. |
UINT32 | 32-bit Positive integer. |
UINT16 | 16-bit Positive integer. |
UBYTE | 8-bit Positive integer. |
BOOL | Either TRUE or FALSE, cannot be empty or have mutiple values. |
Group 2: | HadronZoo Defined types (fixed size). |
TBOOL | (Tri-boolean) Either TRUE, FALSE or UNKNOWN (empty). Cannot have mutiple values. |
DOMAIN | Internet Domain. |
EMADDR | Email Address. |
URL | Universal Resource Locator. |
PHONE | Telephone number (including international dialing code, 64 bits total). |
IPADDR | IP Address. |
TIME | No of seconds since midnight (4 bytes). |
SDATE | No of days since Jan 1st year 0000. |
XDATE | Full date & time, implemented as two 32-bit unsigned numbers: No of hours since 0000/01/01-00:00:00; and No of microseconds lapsed in the current hour. |
STRING | Any string, treated as a single value. |
TEXT | Any string, treated as a series of words and indexable. Cannot have multiple values. |
TXTDOC | Document from which text can be directly extracted. Single value only. Values stored as binary datum repository (persistent media). |
TXTSRC | Encoded document from which text can be extracted (if a suitable decoder is provided). Single value only. Values stored as binary datum. |
BINARY | Binary large object, e.g. image. Assummed to be un-indexable. Values stored as binary datum. |
Group 3: | Application defined data enumerations. |
ENUM | Data enumeration. |
Group 4: | Application defined special text types. |
APPDEF | A string conforming to a regular expression specified in the application configs. |
Group 5: | Application defined data class. |
CLASS | Instance of a predefined (sub)class. |
OBJECT | Address of an instance of a predefined class. |
The hdbBasetype enum directs datum validation, processing and storage. Interpretation depends on the group. Groups 1 and 2 state the exact data type and no further qualification is required. These data types always have the same meaning, so datum are always processed the same way. Groups 3, 4 and 5 are application specific and only state that the data type is a data enum (confined by a set of values); an APPDEF (string confined by a regular expression), or a data class. The hdbBasetype does not state which data enum, APPDEF or class. In order to process data, the exact type of data enum, APPDEF or data class must be separately stated.
HDB Data Restraints
Note that under the HDB regime, data object members are subject to population restraints based on their data type. Note that BOOL (true/false) and TBOOL (true/false/uknown) members may not have multiple values. Attributes that data objects either do or do not have, cannot be both true and false at the same time. TEXT, TXTDOC and TXTSRC members are likewise restricted, as it does not make sense to have multiple descriptions of the same thing.
Note that all data class members are ultimately atomic, i.e. of a group 1-4 data type. Objects can have subclass members but either the subclass entirely consists of atomic members or some subclass further down the line entirely consists of atomic members. This is ensured by the config read process which will only permit members to be of a data type, if that data type has already been defined.