JavaScript Object Notation (JSON)
The HDB does not use JSON to represent or transport hierarchical data objects. However due to the preponderence of JSON, the single data object container hdbObject, has methods for JSON import and export.
As is widely documented, JSON is built on two structures as follows:-
Object: | An unordered collection of name/value pairs. The format of a JSON object begins/ends with opening/closing curly braces. The entries are of the form name:value separated by a comma. The name refers to a data class member while the value must evaluate to a legal value of the member's data type. |
Array: | An ordered list of values. The format of an array begins/ends with opening/closing square brackets containing comma separated values. |
The values in question, must be of the following:-
1) A string (sequence of 0 or more unicode characters) enclosed in double quotes 2) A number (decimal and standard form) 3) True/False 4) Null 5) An object (using {}) 6) An array (using [])
Form (5) and (6), it is clear that these structures can be nested, thereby enabling hierarchy. Although JSON only has bool, number and string as atomic data types, JSON objects can be imported into a HDB data repository, providing:-
1) | The names in the JSON match the member names of the applicable data class. 2) | The JSON data types are compatible. 3) | The format of the datum are as expected by the target member. 4) | If the JSON object contains an array, the target member must be configured to accept multiple values. Note that under the HDB, not all HadronZoo data types can be multiple. 5) | If it is an object in the JSON, the target member must be of a subclass in the HDB data object. |
HadronZoo Delta Notation (HDN), and the Delta Server
As stated in the glossary of terms, HDN is Formal shorthand for writing out data state deltas to delta files. It was originally devised purely for HDB data object repositories. HDN described new data objects, changes in data objects, and data object deletions, but nothing outside the realm of data objects. It was realized that real time online backup could be acheived, by mirroring the deltas as they arose, to another physical machine. This led to the development of the Delta Server, initially as a backup regime. It was further realized that mirroring deltas could enable server redundancy strategies, but only if the deltas covered every data resource type used in applications. HDN was accordingly extended.
There is a naming convention which relates to where deltas are in the system. File Deltas are those in repository delta files. Origin Deltas are en-route from server program to the local delta server. Transmission Deltas are en-route from one delta server to another. Notification Deltas are en-route from the local delta server to local server programs (having arisen on another physical machine in the cluster). In earlier implementations it was practice to omit identifers that could be implied from the whereabouts of deltas, which led to the convention. This practice was abandoned so in the modern HDB, all deltas are fully qualified. The convention remains as it is useful in descriptions.
Data Resource Types and Consequent Delta Formats
Deltas are required to uniquely identify the data resource and fully describe the data state change. Accordingly, the format of deltas depends on the nature of the resource and the possible changes. The various types of data resource are set out below. Note that the deltas in all cases begin with the following three ids:-
1) | The server id: | This is the id of the server upon which the the data state change arose. The id is that within the cluster (always A, B, C or D). This is needed in order to terminate delta circulation. | ||
2) | The app id: | This is a non-recurring sequence number, assigned by delta server negotiation. The app as named, will always have this id. 3) | Resource Id: | Data resource id. |
Note that for brevity the server id is denoted as 's', the app id as 'a', and the resource id as either f, d or r (see below). Note also that the periods in the delta forms are for ilustration only. They do not exist in real deltas.
App Specific Files: These are fixed path files that are considered to be part of the application, so are assigned delta ids. Their content is treated as application data, so any change will give rise to a delta. The available operations are APPEND (+) and REPLACE (=). There is no DELETE as the files are assumed to be omnipresent. Deltas are of the form s.a.f[+=]value, where value is the new or additional content.
App Specific Dirs: These are either single directories with an absolute fixed path, or sets of directories with relative paths that depend on a data parameter, e.g. user Id. Either way they are considered part of the application, and so each directory or set of directories is assigned a delta id. Applications can add or delete subdirectories and files, and append or replace files. The filenames and file content are regarded collectively as application data. The available operations are APPEND (+), REPLACE (=) and DELETE (-). The Deltas are of the form s.a.d.fname.op.value, where fname is either the filename or relative path of the file to the directory. There is no need for a CREATE operator the fname will be created if it does not currently exist.
Binary Datum Repositories: These are clearly part of the application and are accordingly assigned delta ids. However binary datum repositories do not write deltas. As binary datum are only values (of a data object member), the task of writing deltas falls to the applicable data object repository.
Data object repositories: These are by far the most important data resource for obvious reasons! Data object repositories offer four operations namely INSERT, UPDATE, DELETE and FETCH, of which the first three will change the data state and thus give rise to a delta. In addition to the server, app and repository ids, the deltas must describe a complete new object in the case of INSERT, and what has been set, added to or removed in each changed member in the case of UPDATE. In the case of DELETE, the delta only needs to state the object id.
In an INSERT the resulting delta has the form s.a.r.objectId[member assignments]. Member assignments are of the form mbrId=value. Multiple member assignments are produced where the member has multiple members. Where the member data type is a (sub)class, the assigned value is the subclass object id, whose issuance would have resulted in a delta. Delta notation itself, is not hierarchical.
In an UPDATE the resulting delta has the form s.a.r.objectId[member adjustments]. Member adjustments are of the form mbrId=value where the member is single value, but where members support multiple values, the adjustments are either of the form mbrId+value or mbrId-value. In a DELETE the resulting delta has the form s.a.r.objectId=0.
Values in a delta are always atomic. Where the member data type is binary, the value stored in the repository will be the address of the binary datum in the applicable binary datum repository. In the delta however, the value will be the binary datum itself. Deltas usually occupy a single line but where strings span multiple lines, HadronZoo Multi-line Format is used.
Deltas in Serial Datacrons
HDB Repositories and app specific file and directory resources are formally registered to the ADP, with at least the intention of subjecting them to the regime imposed by the Delta Server. HDB Repositories however, have serial datacron components which are not registered to the ADP, but which by definition, write delta files.
EDO (Encoded Data Object)
The EDO format was developed as an experiment aimed at improving the space efficiency of RAM Primacy data object repositories. Hitherto data object repository caches, themselves an experiment, were arranged as a two dimensional matrix, i.e. a table. The data objects were of fixed size and were held in blocks of RAM in multiples of 64. This multiple was chosen so that litmus bits, needed for things like disambiguating zero, could be represented by 64-bit integers. Hierarchy was supported by having multiple caches, one for the native data class, another for each different subclass. Where applicable, members with multiple values were similarly supported by two additional caches, one for all 32-bit values and another for all 64-bit values. The arrangement offered excellent performance, and facilitated in-situ data operations. The standard practice of retrieving a whole object in order to access a single member, could be bypassed. The concern was wastage due to gaps, in the form of whole object deletions and blank members.
Eliminating this space wastage has an inevitable space cost. If gaps are squeezed out, the object and member ids cannot serve as offsets into the array, so the positions of objects within the block and of members within data objects, must be otherwise indicated. Any gapless variable length manifestation of a data object, i.e. the EDO, would need to explicitly state the object id, and state what members were present. In addition, any variable length entity, must either state its length or have a starting and/or terminating sequence.
An EDO begins with a serialized integer stating the length of the remainder or tail, of the EDO. The EDO tail begins with serialized integer to state the object id, which depending on the implementation, can be absolute or relative (to the starting object id of an EDO block). The object id is followed by a series of member litmus bits - one byte for every set of 8 litmus bits or part thereof. All members have one litmus bit to state if the member is populated. For BOOL members, this litmus bit is the value. TBOOL members have two litmus bits, the first states if a value is present and the second is the value. Other MEMBERS have an additional litmus bit if they are allowed multiple values, which in practice is rare except for members of a subclass data type. The final part of the EDO is a concatenation of member values, in order of ascending member id, that the litmus bits state are present - with arrays preceeded by a serialized integer to state the number of values present. The values themselves can be encoded as serialized integers, however currently, only signed and unsigned 64 and 32-bit integers use this option.
EDO structure depends on the approach taken to subclass objects. Where subclass objects are stored as an integral part of the host class object, subclass EDOs are nested within the host class EDO. Where subclass objects are held separately, only the object ids appear in the host class EDO. Note also that BINARY and TXTDOC datum are stored in a separate binary datum repository with only the datum id appearing in the EDO. The object and datum ids are of course, stored as serialized integers.
Strings and string-like values are represented within EDOs as string ids, with the actual string values held in string repositories (see article "String Repositories"). EDOs cannot be used outside the context of RAM Primacy because of this dependency.