HDB HadronZoo Database.
Data Class: A data class is purely a data structure definition. It sets out the structure or form, all objects of the class must take. The data structure is defined as a set of members, each of which has a data type and may hold a value or an array of values, of that data type. Note that data classes are themselves, data types, so a member of one data class can have another data class as its data type. By this means data classes support hierarchy.
Data Class Member: A data class member holds one or more values of a predefined data type.
Data Class Object: An instance of a data class. Note that in this document, data class objects are usually referred to as "data objects".
Data Object Member: A member of an individual data object.
Atomic Data: Atomic data values or datum of an atomic data type, are indivisible. Either they are fundamentally indivisible or are treated as such for the purpose of internal data processing and storage.
Composite Data: Data classes are composite data types because they consist of members. The term 'composite data' simply means one or more data objects. For consistency, the terminology applies even where a data class only has one member.
Host/Guest Class: As data classes are data types and class members can be of any predefined data type, members of an object of a (host) data class can contain objects of another (guest) data class. Indeed this is the means by which data classes are hierarchical. Note that because members can only be of a predefined data type it is not possible to define a data class with a member whose data type is itself.
Simple/Complex: A simple data class is one in which all members are atomic and so has no guest classes. The term 'simple' is preferred over 'atomic' because an "atomic data class" would be very confusing! A complex data class obviously, is one with one or more guest class members.
Repository: An unordered persistent data store which issues ids on datum recipt, and retrieves datum on recipt of an id.
Data Object Repository: An unordered persistent data store, serving as mass container of data objects of a given pre-defined class.
Binary Repository: An unordered persistent data store, serving as mass container of binary datum.
Index: An ordered data store, persistent or otherwise.
Ordered/Unordered Within the context of data stores, ordered means that entries are ordered by value. Unordered however, does not mean entries are random, but in order of arrival (i.e. in chronological order).
RAM Primacy (Method) High performance data storage method: The data is memory resident but backed to persistent media. During operation, new and changed datum are written as deltas to a delta file. On startup, the last known data state is reconstructed from the stored deltas. The memory resident data is (usually) arranged in in a form suitable for direct operation within the program, so the RAM is considered to be the primary data store. The persistant media (delta file), is thus secondary. It holds the exact same data, but as deltas in order of occurence, a form unlikely to be suitable for direct operation. Because of high memory consumption, RAM Primacy is usually reserved for high value data that is expected to to accessed with high frequency.
RAM Primacy (Device) Any repository, index, or other program entity that exploits the RAM Primacy method, is a RAM Primacy device.
Delta Notation (HDN): HadronZoo Delta Notation is formal shorthand for writing out data state deltas to delta files. Initially HDN described new data objects, changes in data objects, and data object deletions, but nothing beyond data objects. It was realized that deltas could be sent to other physical servers, for additional backup and to enable server redundancy strategies, but only if all data resources in applications were covered. Accordingly, delta notation was extended to cover all forms of data resources found in applications.
Binary Data/Datum: The term 'binary data' is generally understood as non-text data, usually encoded in some way, e.g. MicroSoft word documents, images and video clips. The term often implies that binary datum are opaque from the perspective of the program in hand, i.e. they are simply atomic values. In the HadronZoo realm, owing to the preponderance of RAM Primacy, the term has an additional meaning. Particular data sources (e.g. fields in a webapp form), are deemed binary in order to store the datum in a binary repository. This is usually because the data source does not warrant RAM Primacy.
Data Resource: Generic term for any data storage entity (repository, index or file), with a specified purpose within the applicable data object model.
AANI: Acronym for the HadronZoo design directive "Always append, never insert", meaning all writes to persistent media, must be to the file end. The directive is not universal, applying only to data containers during 'normal operation'. More specifically, it applies to the INSERT and UPDATE operations of data containers (both unordered and ordered). Any disorder arising in the data files of such containers, is rectified by periodic rationalization.
Datacron: In AANI compliant data containers, data state changes appear in the data files in order of occurence. This, in conjunction with the very common practice of recording the date and time alongside such events, meant that the data files formed a chronological record of all data state changes - since creation or at least, since last rationalization. This observation gave rise to the term 'datacron' to describe AANI compliant data containers. Strictly speaking the firmly established term is a misnomer, as there is no requirement within AANI to record the time and date. The single most important datacron in the HadronZoo repertoire is the delta file.
Serial Datacron: The Serial Datacron is the single most important RAM Primacy device in the HadronZoo repertoire. The RAM component can be an idset, or a collection held by any of the collection class templates. The persistant media component is a delta file.
ISAM: ISAM (Indexed Sequential Access Method) is a method of indexation characterized as a tree with two distinct node types: Index and data. The top level of the tree may only contain data nodes, which contain key/element pairs. The lower levels may only contain index nodes, whose entries point to the nodes in the level above. ISAM can be implemented in RAM, or in persistent media as a composite datacron index.
Indexed Chain: Indexed chain is an ISAM variant, adapted for variable length elements. The index level nodes are similar to those of standard ISAM, but the data level is structured as an ordered concatenation of elements. This concatenation is held in a chain virtual blocks, which elements can span.
Node/Block: Within the context of ISAM and indexed chain but also more widely, logical or physical data store units are described as nodes if they must contain an integer number of whole entries, and blocks if they are not subject to this constraint.
Idset: Set of unique integers, encoded for compactness. As a result of the encoding, the integers always appear in ascending order. Idsets are used in indexes to store data object ids, hence the name.
Native Data Class: Certain data model and application specific entities (e.g. Repositories, Dissemino form definitions and form handlers), either can or must be tied to a data class. Where this is the case, the data class is said to be the "native data class" of the entity.
Alien Data Class: From the perspective of an entity that has a native data class, an alien data class is any other than the native, or any guest class of the native.
hdb/hds mnemonic: The HDB classes form a distinct group within the HadronZoo C++ class library, so are assigned names prefixed with 'hdb' for "HadronZoo Database", rather than the default HadronZoo mnemonic of 'hz'. The other distinct group are the Dissemino classes which are prefixed with 'hds' for "HadronZoo Dissemino".
Program: HadronZoo defines a program as a "distinct and whole executable entity, with a singular space confined by the machine boundary".
Service: Services accept client connections and process and respond to, client requests. In strict terms, a service is the functionality available at a given URL and port. However HTTP/S websites can be regarded as a single service if as is usually the case, the functionality available at both ports is the same. Note that a service or set of services can be provided by a single distinct and whole server program, OR the service(s) may be the result of a number of programs working together, possibly across a multitude of machines.
Microservice (SRM): Microservices as considered by HadronZoo, are completely self-contained server programs which perform a simple primary function and do little or nothing else. Although the function could be anything (e.g. generate a unique id), most HadronZoo microservices avail a single data object repository, hence the term 'single repository microservice' or SRM.
Application: Applications are usually described as "a program with a user interface" or similar. In HadronZoo parlance however the term 'application' means a program adhering to a particular configuration.
Webapp/Website: A website is a HTTP and or HTTPS service and so is a webapp (web application). The difference (in HadronZoo terms), is that a website consists of one or more webapps. Thus, a website can be composite whereas a webapp is always non-composite. This is true even where a webapp is implemented across multiple machines, as it is the result of the Disssemino method applied to a single config.
LAMP Method: Acronym for Linux, Apache, MySQL and PHP/Python/Perl.
Sister (entity): Identical entity manifest on another machine. Applies to applications and data resources, within the context of the delta server.
Control Panel: The control panel method is where HTTP server programs build their HTML responses by aggregating hard coded print commands to a chain buffer. The method was used extensively prior to the development of the Dissemino method, and is still used to produce stats reports. The method is so-called because it is prone to producing pages with "all the grace of home router control panels". Pages produced by the method are known as control panels.
Serialized Integer: Serialized Integers are extensively used by HadronZoo as a space saving device. Four regimes are offered in the HadronZoo library: 32 and 64 bit, signed and unsigned. In serial form 32-bit integers consume between 1 and 5 bytes while 64-bit integers consume between 1 and 9 bytes. The obvious anticipation is that most values will consume fewer bytes than the 4 or 8 bytes that would otherwise be required. In all regimes, if the top bit of the first byte is 0, the series is single byte with 7 data bits (range 0-127). In the 32-bit unsigned regime, if the top bit of the first byte is 1, the next two bits act as controls - so control codes are either 00, 01, 10 or 11. Codes 00, 01 and 10 correspond to a 2, 3 or 4 byte serial, respectively having 13, 21 and 29 data bits. 13 bits has a range of 0-8,191 but as a 2-byte series are never used to express values of less than 128, values are interpreted as 128-8,319. Likewise a 3-byte series has a range of 8,320 to 2,105,471 and a 4-byte series has a range of 2,105,472 to 538,876,383. Code 11 indicates a full 32-bit value is provided in the next 4 bytes. In the 64 bit regimes there are three control bits for range, and in signed regimes an extra control bit is used to indicate negative numbers.
Length Indicator: A length indicator is a serilized integer, used to indicate the length (size) of an entity. By convention, length indicators directly precede the entity in question. Also by convention, the stated length does not include that of the length indicator itself. As entity length should not be unduly large and cannot be negative, length indicators use the 32-bit unsigned serial integer regime.
JSON: JavaScript Object Notation. Widely used to represent hierarchical data objects. Discussed in article 5.4 "Data Encoding".
EDO: Encoded Data Object: Discussed in article 5.4 "Data Encoding".
Cstr: Null terminated character string.
XML-esce: The HadronZoo term for "near or similar to" XML (Extensible Markup Language). HadronZoo uses XML-esce, rather than formal XML for configuration. The tag rules are exactly same but there are no DTD (document type Definition) files involved. All the tags used are expected by the reader program.
VTTO: Acronym for Volume/Traffic Tradeoff, pronounced as 'VETO'. VTTO metrics are important in RAM Primacy cost-benefit analysis, and in the related matter of system server topology. VTTO evaluation considers the relationship between a body of data and the user action (traffic), that accesses and operates upon it. The tradeoff is between the cost and benfits of RAM Primacy, i.e. memory consumption vs performance.
GDPR: General Data Protection Regulations