HadronZoo: Bespoke Software Developers
Dissemino Web Engine Manual

Dissemino Web Engine: What it is and what it does

The Dissemino Web Engine or DWE (formal name HadronZoo::Dissemino), is an open source HTTP/S server for Linux. It is C++ throughout and is based on the HadronZoo C++ Class Library, which includes the in-built HadronZoo database (HDB). The intention as far as is practical, is for webapps (web applications) to be 'C++ powered' and free of external dependencies. Server side scripts in PHP, Python, Perl or other scripting language, are effectively banned as there is no formal method of calling them. Instead, webapp functionality is provided by a set of standard, in-built C++ functions. Although extensive and sufficient for most purposes, this set is not comprehensive. Where it does not suffice, it will be necessary to write additional functions. The DWE is ultimately aimed at C++ programmers. It isn't intended to be everyone's cup of tea.

That said, C++ developers should not expect to do much C++ development, and web developers without C++ skills can still use the DWE. Webapp configs are XML throughout and use a mix of tags: HDB tags define data classes that capture real life objects, declare repositories for such objects and direct data operations; Dissemino tags define webapp pages and other resources. The HTML for these resources is as per the HTML5 standard but as it is encapsulated within XML, tag names must match on case and all tags must be closed. Values of data object members and system values such as the current time and date, are availed by means of simple percent entity notation (see article "Percent Entities"). Percent entities may appear anywhere within the value part of HTML tags for the purpose of display, or within blocks of HDB tags that direct data operations.

The HadronZoo library has two class families of interest here. The HDB classes which embue programs with the HadronZoo database, and the Dissemino classes which embue programs with a HTTP interface and the capability to host a webapp. The HDB tags in webapp configs each create or manipulate a HDB class instance, such as a data class, a repository, or a single data object. The Dissemino tags each create or manipulate a Dissemino class instance, such as a webpage, a server-side include block, or a cookie. The set of HDB and Dissemino tags and their associated C++ classes and functions in the library; the means by which the functions are triggered; the encapsulation of HTML5 within the XML configs; and use of percent entities to lookup and display values; together amount to what is known as the Dissemino method (DM). The method design arose from a study of the LAMP method (Linux, Apache, MySQL and PHP/Python/Perl). In effect, the method is LAMP but with everything except the Apache trigger mechanism and the Linux OS, directly replaced with in-built C++ functions.

Note that a Dissemino webapp has essentially the same definition as a website, i.e. it is a collection of web pages and related content, identified by a common domain name. The two terms can be interchangable, but there are subtle differences. While a website would usually provide all the content available at the domain, a webapp might not do so. Websites can be constructed as one or more webapps, for example, with one webapp providing passive information pages while another provides the functionality of a member's only area.

Why the HDB? Why No SQL?

In terms of what is apparent to visitors, websites predominantly trade in whole 'real life' objects. Forms are designed specifically to create/edit objects of a given class, one at a time, and queries are likewise limited to objects of a given class. Typically a querry will find or not find, a single object; OR it will result in a memu of objects if more than one is found, the object itself if only one is found, or a 'No records found' message. It makes sense to have repositories that can store and retrieve real life objects whole. This dictates a hierarchical database but once in place, such queries can be implimented as a single key lookup on a single repository. You don't need SQL for that.

Before the DM and HDB, experimental webapps stored real life objects as C++ class instances in memory resident collections. This hard coded the data model so it was not possible to configure the webapps by means of a config file. Single key inserts and lookups however, were microsecond operations. In outline, the design remit of the DM and HDB was to fix the config problem whilst retaining as much of the performance as possible. The HDB section of the HadronZoo Library Manual explains the HDB rationale in more detail, but it ought to be clear why HadronZoo was reluctant to go down the SQL path.

Not all queries are single key lookups of course. There are common features such as automated guidence in shopping carts, where users are informed that "People who bought this item also bought ...". Then there are back end reports that mine the database for market trends. In the experimental webapps these more complex queries required specific coding in C++, which auguably makes the case for SQL. However even with the 'full flexibility' of SQL, one still has to think through the process. Experience with queries that exploited the much greater flexibilty of C++, suggested there were opportunities for standardization - which is the approach now taken.

The design was tested by the Dissemino Observations, a wide ranging study of websites selected for their noteable data functionality. The plan was not to establish how the websites actually worked, but how the same effect could be achieved using the DM and HDB. The observations identified numerous features that were missing or poorly handled by the DM, but no examples of data functionality that exceeded HDB capabilities, or that ultimately could not be standardized.

Using the HDB

With the HDB, developers first define data classes (specify data structures) for the real life objects, then use the data classes to create repositories that will hold the objects. Where a data class is hierarchical, i.e. has members that are of another data (sub)class, there is a choice. Subclass objects can be held in the parent class repository, along with the parent class objects, or in their own repository if they are useful in thier own right. Either way, in INSERT and FETCH operations on the parent class repository, there are no explicit joins as objects are assembled and disassembled automatically. With a standard SQL relational database, subclass tables would be needed in all cases, as would joins. As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be very basic and will likely need modifying, but it should not be necessary to alter the executive commands which make the forms operate. As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be basic, but it should not be necessary to alter the executive commands which make the forms operate.

Microservices

Supplied in the HadronZoo download along with HadronZoo::Dissemino (the web engine), is HadronZoo::RepoServer, each instance of which avails a single HDB repository as a completely independent, omnipresent microservice. The repository microservices are external to the web engine, and replace repositories the web engine would otherwise have to host internally. Although microservice use adds steps to data operations and increases latency, there are circumstances where these costs are outweighed. By having common data sets such as verified email addresses, available as a microservice, wasteful duplication is avoided. And without repository microservices, large distributed and/or resilient systems, could not be built.

Repository microservices are easy to create, and easy to direct webapps towards. HadronZoo has standardized its config regime so the RepoServer program uses the exact same HDB tags to define the repository data class, as the web engine does. In webapp configs, repository declarations indicate microservice use by supplying the microservice IP address and port number as attributes. Otherwise by default, the repository will be created as an internal entity. In order to mitigate the extra latency, it is recommended that the machine hosting the microservice is placed in the same data center as that hosting the web engine.

Dissemino Performance

The web engine as part of the HadronZoo download, is free open source software. Because of this, due consideration was given to performance on entry level servers. To this end, the number of threads used to serve HTTP is configurable. The default is 1, which is known as slow mode. In slow mode there will still be one or more background threads, but all client connections and requests are completely handled by the main thread. In fast mode where the number of threads is >1, the main thread accepts HTTP connections and receives requests, but passes them via a lock free queue, to another thread to process and send out the response.

A key performance metric is the number of requests per second. Bench tests on fairly typical business webapps, showed sustained throughput of some 2000 requests per second with the web engine in slow mode. Fast mode with 8 threads (one per core), pushed this to 5000. This is really the limit for an 8-core, 3Ghz server, since further threads had no discernable effect. These are ball park numbers but pretty good ball park numbers. Different requests vary in the time they take, but in predicable ways. The time taken to receive and respond to a request, depends on the volume of data transfered, while the time taken to process the request depends on what data operations are required. Small fixed content pages are the fastest, usually completing within 200 microseconds, as only the HTTP request header is uploaded (~1Kb), only page header and content is downloaded (say ~5Kb), and the only lookup is in the map of URLs to pages, which is memory resident. Larger fixed content pages however, are not slow unless the pipe is. In terms of process time, each additional Kb of data, only adds around 3 microseconds.

Most requests for active resources will complete within 400 microseconds, or can readily be made to by tuning the applicable HDB repositories. The only real outliers are where free text indexation is applied to large document uploads. The process takes some 10 microseconds per word. Free text searches also exceed the 400 microsecond ball park, particularly if they contain dozens of terms.

Dissemino in the 'Market'

Dissemino won't be everyone's cup of tea and being so was never the intention. Not everyone will like the approach to data. Our only concern is that it is a new product and to date has few websites to its name. All the sites we have developed have had a strong utilitarian theme. They are not feature rich so the list of features Dissemino has had to support so far has been limited. Dissemino fully supports HTML5 and places no restrictions on the use of CSS or JavaScript. There are no reasons for thinking there will be issues with feature rich sites but we look forward to seeing examples up and running. Potential Dissemino developers will also be concerned about overall take up of Dissemino. It does not have to be particularly popular but it must have good odds of achieving a userbase large enough for developers to be reasonable easy to come by. So whose cup of tea is it?

It will appeal of course, to those who like the approach to data, the thinking behind it, and the layout of the configs. If you like the thinking behind the software you are likely to find it easy to learn - and this is critical to take up rates. Development costs are mostly a matter of site complexity and it is not exactly easy to get a complex website going using more established tools and methods. It has not escaped our attention that there are sizable teams on long term contracts working on a single website, using such technology as node.js. There has to be a message in that somewhere!

More tangibly, Dissemino is pretty technically ambitious for the money. The software itself is free but servers have running costs. It stands to reason that the more efficient the software is at handling requests, the more traffic each physical server can handle and this lowers hosting costs. For most commercial operators however, a busy site both justifies and facilitate a large budget so why rock the boat with technology yet to establish a track record when you don't have to? There is a simple answer to that question - there is no such thing as too much server capacity. Or are people suggesting that there is?