Outline Description

The HadronZoo Download is free open source software, written in C++ for the Linux operating system. It comprises the HadronZoo C++ class library and the following programs:-

- HadronZoo::RepoServer Provides a data repository as a microservice.
- HadronZoo::DeltaServer Mirrors deltas (data state changes) to other servers to provide real-time backup, and acts as authorative issuer of database resource ids.
- HadronZoo::Dissemino Dissemino Web Engine (DWE). Fully featured XML configured web engine for the creation and hosting of webapps (websites).
- HadronZoo::Epistula Mail server with webmail.
- HadronZoo::Codeproc A psudo-compiler which constructs a web manual from a body of code and the comments found within it.

Library Description

The library is a foundation for high performance server and data processing programs. With the exception of third party libraries which provide functionality specific to particular disciplines, the library can be taken as a comprehensive framework for such programs. The firm intention is that in the general case, programs based on the library will be free of external dependencies.

All libraries impose methods upon developers but it is a question of scope. A third party library might have some horrible math behind it, but it can be treated as a black box. The interface may not be to the developer's liking, but it will only part of the code. Foundation libraries are a different matter. The methods are imposed across the entire code base, in accordance with an overarching thesis. A foundation library is thus an all or nothing proposition. Developers must both like the methods and buy the thesis, or use a different foundation. The HadronZoo class library is no exception.

HadronZoo is particularly fond of large, memory resident collections of ordered objects, and using the binary chop algorithm to search them. With this method, 1,000 ordered objects are searched within 10 comparisons, a million within 20, and a billion within 30. The comparisons are usually of strings which take ~100 nanoseconds. Thus the 'needle' in a billion strong haystack is found in ~3 microseconds. With integer comparisons, searches are more than an order of magnitude faster. The obvious downside is that data volumes are limited by available RAM (or more precisely, by the RAM that can be reliably hogged without the risk of swap mode). However deployment opportunities are widespread. The 'searchable' component of many data sets will fit within the available RAM. A key part of the HadronZoo thesis, is that these opportunities are to be taken seriously. Not only are the opportunities to be used, but that steps should be taken to create them. That last point means wholesale space optimization, which has significant implications.

Space optimization is why HadronZoo has its own string class and why this and other classes, use special allocation regimes. Programs will often need to address more than 4 billion bytes, but it is rare to have anything approaching half that number of objects. So instead of issuing 64-bit pointers, the allocation regimes issue 32-bit addresses that translate to 64-bit pointers, effectively halving the pointer size. Space optimization is also why HadronZoo has its own collection class templates, and why these are based on ISAM instead of the more pointer hungry binary trees. Developers are expected to embrace all of this. It is not difficult to see why some of them will find this a hard carrot to cruch.

Database Operations

Memory resident data can be backed up in real time by streaming out deltas to buffered files. On startup the deltas are read back in to restore the data state. This is known as RAM Primacy because the RAM is the primary store. It holds all the data in the required order, and in a form suitable for direct operation within the program. The file is secondary. It holds all the data but only as deltas in order of occurrence. RAM Primacy is very fast: FETCH by virtue of the binary chop; INSERT and UPDATE by virtue of file buffering. Both are 'microsecond operations', meaning they complete within a low single digit number of microseconds.

Because the fastest writes are to the ends of buffered files, HadronZoo ensures all writes are this way. At no point are blocks in data files or any other typs of file, inserted or overwritten. This is known as the AANI rule - Always Append, Never Insert.

HadronZoo programs (programs based on the HadronZoo library), will generally use the HDB (HadronZoo Database). This is a non-SQL, hierarchical, share-nothing database regime. It is neither compulsory nor preclusive of other databases. However HDB repositories and indexes are AANI compliant and fully integrated into the Delta Server regime. If basing a program on the HadronZoo library makes sense, it will make sense to use the HDB.

The HDB is embued by the HDB class group, which is part of the library. The HDB is internal to programs that use it, with data model entites represented by HDB class instances that exist within the program space. This arrangement enables direct in-program operation, which is one reason why the HDB is non-SQL. Note that Repositories can be declared as external in which case they are availed as microservices. However they are still internally instantiated, since they could not be operated upon otherwise.

There are two ways of instantiating HDB data model entities: In the program code; and by XML tags in the program configs. HadronZoo configs are XML throughouts and are standarized. There are HDB tags which map 1:1 to data model entities, and the library has the necessary config read functions to process them.

Server Classes

The library provides a general epoll based server class which handles both TCP and UDP client connections simultaneously. There is an emphasis on serving HTTP. The library includes the Dissemino class family, which embue programs with a HTTP interface. Dissemino webapps as such interfaces are known, can range from simple control panels requiring little or no configuration, to sophisticated websites with extensive configuration.

Dissemino webapp configs use a mix of HDB, Dissemino and HTML5 tags. The Dissemino tags define web pages and define event action. The HTML5 tags are encapsulated within the XML, in other words they are as per the HTML5 standard, but follow XML rules. Tags that can be left open in HTML, must be closed in the configs.

Anyone familar with PHP and Apache will know that on page request, Apache scans page content for PHP scripts, executes them and substitutes script body with script output (if any), in the final HTML. The Dissemino method performs similar substitutions, except that PHP scripts are replaced by Dissemino tags which direct calls to built-in C++ functions. The set of tags and built-in C++ functions provided by the Dissemino classes will suffice in the general case. Should additional functionality be needed, tags and C++ functions are easy to add. The Dissemino method, classes and tags are discussed, from the C++ perspective in this manual (see Ch 6 "Dissemino Classes"), and from the web development perspective in the Dissemino manual.

Design and Development Rules and Guidelines

Functional demarcation can be facilitated by external programs or scripts (the small program approach), or by internal functions that exist within the same program space (the large program approach). HadronZoo strongly prefers the latter. The Delta Server upon which all HadronZoo systems depend, is external as it is run on each machine as a singleton process. Common data sets are routinely availed as external, single repository microservices (SRMs). These exceptions aside, everything is internal. As a consequence, HadronZoo systems tend to comprise a very small number of large programs, usually just one. Is the large program approach better or worse than the small program approach? No. It depends on the objectives which For HadronZoo are performance, easier logging, and fewer system integration issues.

For security reasons HadronZoo has a policy of not running software without both source code, and a clear understanding of it. It is acknowledged that the source code is not always available, however this should only apply in the case of third party libraries that provide specialist functionality. In terms of code clarity, it is realized that parts of the C++ langauge have onerous syntax and may not be universally understood. Where possible, these are avoided. There are no alternatives to class and function templates, vararg macros and functions or function pointers, so these are used. Initialization lists however, are never used as they are cumbersome and unecessary. HadronZoo class instances are always created blank and then initialized for use in a distinct, separate step. Many of the more complex aspects of modern C++ (C++11 onwards), are also avoided.

Method Prescription Dichotomy

It is not the intention to prescribe. The library and supplied programs were written in accordance with HadronZoo design philosophy, rules and guidelines, but there is nothing that obligates developers to do likewise. Providing licencing terms are adhered to, developers are free to use the library as they wish. The HadronZoo download is free open source code, published in the hope that developers will find it useful, howsoever it is used.

That said, it is not the intention to avoid prescription at any cost. Extensive use of class specific memory allocation regimes, particularly that of hzString, the HadronZoo string class, makes prescription inevitable. Many classes in the library have hzString members and these cannot simply be replaced with STL strings. Whilst very similar, the functionality is not the same and the interface is not entirely compatible. Thousands of lines of code would need to change, which makes no sense. Developers are free to import HadronZoo classes into programs and adapt them as needed, but they may find the process less than straightforward.

About HadronZoo

HadronZoo is small, consisting only of the founder and a few ad-hoc collaborators. Although collaborators were crucial to the software development effort, the applied collaborative model has historically differed from the norm. Instead of the founder acting as team leader or as a member of a team of equals, the collaborators used the library in solutions they provided to thier clients, and the founder implemented the changes they requested. Not every change request furthered the aims of the library, but the arrangement was the source of much impetus and direction. It also enabled library classes to be deployed and tested in live environments, well in advance of what would otherwise have been possible. As a result of this arranement, the founder is the sole author of the current library and the other programs in the download. However with the software now more mature, the collaborative model is no longer appropriate. Going forward, collaborators will be permanent, assume a more custodial role and over time, will be listed as code contributors.