Outline Description
The HadronZoo Download is free open source software, written in C++ for the Linux operating system. It comprises the HadronZoo C++ class library and the following programs:-
- HadronZoo::RepoServer | Provides a data repository as a microservice. |
- HadronZoo::DeltaServer | Mirrors deltas (data state changes) to other servers to provide real-time backup, and acts as authorative issuer of database resource ids. |
- HadronZoo::Dissemino | Dissemino Web Engine (DWE). Fully featured XML configured web engine for the creation and hosting of webapps (websites). |
- HadronZoo::Epistula | Mail server with webmail. |
- HadronZoo::Codeproc | A psudo-compiler which constructs a web manual from a body of code and the comments found within it. |
Library Description
The library is a foundation for high performance server and data processing programs. With the exception of third party libraries which provide functionality specific to particular disciplines, and on the proviso that developers concur with the methods herein, the library can be taken as a comprehensive foundation for such programs.
There is a strong real time theme with the binary chop algorithm placed center stage. With this 1,000 ordered objects can be searched in 10 steps, a million in 20, and a billion in just 30. The steps are simple comparisons, usually of strings. With the objects in question memory resident, comparisons are fast. Strings take ~100 nanoseconds, so the needle in a billion strong haystack is found in ~3 microseconds. With other data types, searches can be even faster. In veiw of this performance, it makes sense to maximize the extent to which searchable data is memory resident. The more searchable data there is in the memory, the more the binary chop can be applied. The limitation is RAM or more precisely, the volume of data the RAM can reliably accommodate without going into swap mode, given the other demands placed upon it. Space optimization is the only possible mitigator. Space optimization is why HadronZoo has its own string class and why this and other classes, use special allocation regimes. Instead of issuing 64-bit pointers, these regimes issue 32-bit addresses that translate to 64-bit pointers, effectively halving the pointer size. Space optimization is also why HadronZoo has its own collection class templates, and why these are based on ISAM instead of the more pointer hungry binary trees.
Database Operations
Memory resident data can be backed up in real time by streaming out deltas to buffered files. On startup the deltas are read back in to restore the data state. This is known as RAM Primacy because the RAM is the primary store. It holds all the data in the required order, and in a form suitable for direct operation within the program. The file is secondary. It holds all the data but only as deltas in order of occurrence. RAM Primacy is very fast: FETCH by virtue of the binary chop; INSERT and UPDATE by virtue of file buffering. Both are 'microsecond operations', meaning they are complete (from the perspective of the calling program), within a low single digit number of microseconds.
Because the fastest writes are to the ends of buffered files, HadronZoo ensures all writes are this way. At no point are blocks in data files or any other typs of file, inserted or overwritten. This is known as the AANI rule - Always Append, Never Insert.
HadronZoo programs (programs based on the HadronZoo library), will generally use the HDB (HadronZoo Database). This is a non-SQL, hierarchical, share-nothing database regime. It is not compulsory, nor is it preclusive of other databases. However the HDB has AANI compliant repositories and indexes that are aided by RAM Primacy, and is fully integrated into the Delta Server regime. If it makes sense to base a program on the HadronZoo library, it will make sense to use the HDB.
The HDB is embued by the HDB class group, which is part of the library. The HDB is internal to programs that use it, with data model entites represented by HDB class instances that exist within the program space. This arrangement enables direct in-program operation, which is one reason why the HDB is non-SQL. Note that Repositories can be declared as external in which case they are availed as microservices. However they are still internally instantiated, since they could not be operated upon otherwise.
There are two ways of instantiating HDB data model entities: In the program code; and by XML tags in the program configs. HadronZoo configs are XML throughouts and are standarized. There are HDB tags which map 1:1 to data model entities, and the library has the necessary config read functions to process them.
Server Classes
The library provides a general epoll based server class which handles both TCP and UDP client connections simultaneously. There is an emphasis on serving HTTP. The library includes the Dissemino class family, which embue programs with a HTTP interface. Dissemino webapps as such interfaces are known, can range from simple control panels requiring little or no configuration, to sophisticated websites with extensive configuration.
Dissemino webapp configs use a mix of HDB, Dissemino and HTML5 tags. The Dissemino tags define web pages and define event action. The HTML5 tags are encapsulated within the XML, in other words they are as per the HTML5 standard, but follow XML rules. Tags that can be left open in HTML, must be closed in the configs.
Anyone familar with PHP and Apache will know that on page request, Apache scans page content for PHP scripts, executes them and substitutes script body with script output (if any), in the final HTML. The Dissemino method performs similar substitutions, except that PHP scripts are replaced by Dissemino tags which direct calls to built-in C++ functions. The set of tags and built-in C++ functions provided by the Dissemino classes will suffice in the general case. Should additional functionality be needed, tags and C++ functions are easy to add. The Dissemino method, classes and tags are discussed, from the C++ perspective in this manual (see Ch 6 "Dissemino Classes"), and from the web development perspective in the Dissemino manual.
Design and Development Rules and Guidelines
Functional demarcation can be facilitated by external programs or scripts (the small program approach), or by internal functions that exist within the same program space (the large program approach). HadronZoo strongly prefers the latter. The delta server upon which all HadronZoo systems depend, is external as it is run on each machine as a singleton process. Common data sets are routinely availed as external, single repository microservices (SRMs). These exceptions aside, everything is internal. As a consequence, HadronZoo systems tend to comprise a very small number of large programs, usually just one. Is the large program approach better or worse than the small program approach? No. It depends on the objectives which For HadronZoo are performance, easier logging, and fewer system integration issues.
For security reasons HadronZoo has a policy of not running software without both source code, and a clear understanding of it. It is acknowledged that the source code is not always available, however this should only apply in the case of third party libraries that provide specialist functionality. In terms of code clarity, it is realized that parts of the C++ langauge have onerous syntax and may not be universally understood. Where possible, these are avoided. There are no alternatives to class and function templates, vararg macros and functions or function pointers, so these are used. Initialization lists however, are never used as they are cumbersome and unecessary. HadronZoo class instances are always created blank and then initialized for use in a distinct, separate step. Many of the more complex aspects of modern C++ (C++11 onwards), are also avoided.
Method Prescription Dichotomy
It is not the intention to prescribe. The library and supplied programs were written in accordance with HadronZoo design philosophy, rules and guidelines, but there is nothing that obligates developers to do likewise. Providing licencing terms are adhered to, developers are free to use the library as they wish. The HadronZoo download is free open source code, published in the hope that developers will find it useful, howsoever it is used.
That said, it is not the intention to avoid prescription at any cost. Extensive use of class specific memory allocation regimes, particularly that of hzString, the HadronZoo string class, makes prescription inevitable. Many classes in the library have hzString members and these cannot simply be replaced with STL strings. Whilst very similar, the functionality is not the same and the interface is not entirely compatible. Thousands of lines of code would need to change, which makes no sense. Developers are free to import HadronZoo classes into programs and adapt them as needed, but they may find the process less than straightforward.
About HadronZoo
HadronZoo is small, consisting only of the founder and a few ad-hoc collaborators. Although collaborators were crucial to the software development effort, the applied collaborative model has historically differed from the norm. Instead of the founder acting as team leader or as a member of a team of equals, the collaborators used the library in solutions they provided to thier clients, and the founder implemented the changes they requested. Not every change request furthered the aims of the library, but the arrangement was the source of much impetus and direction. It also enabled library classes to be deployed and tested in live environments, well in advance of what would otherwise have been possible. As a result of this arranement, the founder is the sole author of the current library and the other programs in the download. However with the software now more mature, the collaborative model is no longer appropriate. Going forward, collaborators will be permanent, assume a more custodial role and over time, will be listed as code contributors.