Dissemino Web Engine: What it is and what it does
The Dissemino Web Engine (HadronZoo::Dissemino), is an open source Linux/C++ web server, developed by HadronZoo for the purpose of hosting websites, or webapps to use the preferred
term, that worked the way HadronZoo wanted them to work. It is based on the HadronZoo class library, uses the non-SQL HadronZoo database (HDB), and rigidly adheres to the HadronZoo
doctrine of implimenting all functionality in C++. Server side scripts in PHP, Python, Perl or other scripting language, are effectively banned. There is no formal means of calling
them. Two reasons for this approach are as the HadronZoo homepage suggests, performance and security. Scripts are much slower than C++ function calls, and HadronZoo has a policy of
not running server programs for which it does not have full control of the source code. It is also a matter of taste. As a real time 'bare metal' programming enterprise, HadronZoo
regards C++ as completely intuative. It has the opposite perception of scripting languages.
This contrarian stance does not mean webapps are coded in C++. Webapp configs are XML throughout, and have a structure web developers will generally be familiar with. A mix of tags
are used: HDB tags define data classes that capture real life objects, declare repositories for said objects and direct data operations; Dissemino tags specify the webapp pages and
other resources. The HTML for these resources is as per the HTML5 standard but as it is encapsulated within XML, tag names must match on case and all tags must be closed. Values of
data object members and other variables such as the current time and date, are availed by means of a simple percent entity notation. (see article 3.1: "Percent Entities"). Percent
entities may appear anywhere within the value part of HTML tags for the purpose of display, or within blocks of HDB tags that direct data operations.
The HadronZoo library has two class families of interest here. The HDB classes which embue programs with the HadronZoo database, and the Dissemino classes which embue programs with
a HTTP interface. The HDB tags in Webapp configs each create or manipulate a HDB class instance, such as a data class, a repository or a single data object. The Dissemino tags each
create or manipulate a Dissemino class instance, such as a webpage, a server-side include block, or a cookie. The set of HDB and Dissemino tags and their associated C++ classes and
functions in the library; the means by which these functions are triggered; the encapsulation of HTML5 within the XML configs; and the use of percent entities to lookup and display
values; together amount to what is known as the Dissemino method. The method design arose from a study of the LAMP method (Linux, Apache, MySQL and PHP/Python/Perl). In effect, the
method is LAMP, but with everything except the Linux OS, replaced.
In the general case, a webapp will have real life objects that can be defined using the standard set of data types, and will do little more than store and retrieve such objects. If
so, C++ skills are not required. However, where it is necessary to extend the set of Dissemino tags to cope with specialist data peculiar to particular disciplines, profficiency in
C++ will be required to write the C++ classes and functions such tags will trigger.
Note that a Dissemino webapp has essentially the same definition as a website, i.e. it is a collection of web pages and related content, identified by a common domain name. The two
terms can be interchangable, but there are subtle differences. While a website would usually provide all the content available at the domain, a webapp might not do so. Websites can
be constructed as one or more webapps, for example, with one webapp providing passive information pages while another provides the functionality of a member's only area.
Why the HDB? Why No SQL?
The HDB design was informed by a study of numerous live websites with noteable data functionality. The Dissemino Observations as the study was known, found that of all the apparent
functionality observed in the websites, none of it needed SQL. What the observations instead concluded, was that websites traded in whole real life objects, and needed repositories
that would store and retrieve them whole. Matters, at least with websites as accessed by users, are more straightforward this way. With queries constrained by search forms, most if
not all queries will apply to a single repository, with the result being a list of zero or more data objects from the same repository. In the usual case, in response to hitting the
search button, users are presented with a memu of objects if more than one is found, the object itself if only one is found, or a 0 records found message.
With the HDB, developers first define data classes (specify data structures) for the real life objects, then use the data classes to create repositories that will hold the objects.
Where a data class is hierarchical, i.e. has members that are of another data (sub)class, there is a choice. Subclass objects can be held in the parent class repository, along with
the parent class objects, or in their own repository if they are useful in thier own right. Either way, in INSERT and FETCH operations on the parent class repository, there are no
explicit joins as objects are assembled and disassembled automatically. With a standard SQL relational database, subclass tables would be needed in all cases, as would joins.
As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be basic, but it should not be
necessary to alter the executive commands which make the forms operate.
This is not to say that there won't be difficult exceptions, back end reports being a case in point. Instead of humans looking up and modifying data objects specific to themselves,
or a bot pretending to do the same, the objective is usually to mine the entire database for market trends. The are no constraining forms so multi-repository lookups and joins, are
all but inevitable. Joins are never implicit with the HDB, so reqire explicit loops. There is a case for SQL here, but not a compelling one. With or without the flexibility of SQL,
it pays to consider the mining and report build process. With this approach, explicit loops are not so strange and of course, are very easy to impliment in C++.
Aside from the merits of a hierarchical data model, HadronZoo did not want the database to be a black box. With prior experience of proprietary database design, taking full control
of the database was a natural step. With the HDB part of the HadronZoo library, the HDB classes and functions are intrinsic aspects of the web engine ...
As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be basic, but it should not be
necessary to alter the executive commands which make the forms operate.
Microservices
Supplied in the HadronZoo download along with HadronZoo::Dissemino (the web engine), is HadronZoo::RepoServer, each instance of which avails a single HDB repository as a completely
independent, omnipresent microservice. The repository microservices are external to the web engine, and replace repositories the web engine would otherwise have to host internally.
Although microservice use adds steps to data operations and increases latency, there are circumstances where these costs are outweighed. By having common data sets such as verified
email addresses, available as a microservice, wasteful duplication is avoided. And without repository microservices, large distributed and/or resilient systems, could not be built.
Repository microservices are easy to create, and easy to direct webapps towards. HadronZoo has standardized its config regime so the RepoServer program uses the exact same HDB tags
to define the repository data class, as the web engine does. In webapp configs, repository declarations indicate microservice use by supplying the microservice IP address and port
number as attributes. Otherwise by default, the repository will be created as an internal entity. In order to mitigate the extra latency, it is recommended that the machine hosting
the microservice is placed in the same data center as that hosting the web engine.
Dissemino Performance
The web engine as part of the HadronZoo download, is free open source software. Because of this, due consideration was given to performance on entry level servers. To this end, the
number of threads used to serve HTTP is configurable. The default is 1, which is known as slow mode. In slow mode there will still be one or more background threads, but all client
connections and requests are completely handled by the main thread. In fast mode where the number of threads is >1, the main thread accepts HTTP connections and receives requests,
but passes them via a lock free queue, to another thread to process and send out the response.
A key performance metric is the number of requests per second. Bench tests on fairly typical business webapps, showed sustained throughput of some 2000 requests per second with the
web engine in slow mode. Fast mode with 8 threads (one per core), pushed this to 5000. This is really the limit for an 8-core, 3Ghz server, since further threads had no discernable
effect. These are ball park numbers but pretty good ball park numbers. Different requests vary in the time they take, but in predicable ways. The time taken to receive and respond
to a request, depends on the volume of data transfered, while the time taken to process the request depends on what data operations are required. Small fixed content pages are the
fastest, usually completing within 200 microseconds, as only the HTTP request header is uploaded (~1Kb), only page header and content is downloaded (say ~5Kb), and the only lookup
is in the map of URLs to pages, which is memory resident. Larger fixed content pages however, are not slow unless the pipe is. In terms of process time, each additional Kb of data,
only adds around 3 microseconds.
Most requests for active resources will complete within 400 microseconds, or can readily be made to by tuning the applicable HDB repositories. The only real outliers are where free
text indexation is applied to large document uploads. The process takes some 10 microseconds per word. Free text searches also exceed the 400 microsecond ball park, particularly if
they contain dozens of terms.
Dissemino in the 'Market'
Dissemino won't be everyone's cup of tea and being so was never the intention. Not everyone will like the approach to data. Our only concern is that it is a new product and to date
has few websites to its name. All the sites we have developed have had a strong utilitarian theme. They are not feature rich so the list of features Dissemino has had to support so
far has been limited. Dissemino fully supports HTML5 and places no restrictions on the use of CSS or JavaScript. There are no reasons for thinking there will be issues with feature
rich sites but we look forward to seeing examples up and running. Potential Dissemino developers will also be concerned about overall take up of Dissemino. It does not have to be
particularly popular but it must have good odds of achieving a userbase large enough for developers to be reasonable easy to come by. So whose cup of tea is it?
It will appeal of course, to those who like the approach to data, the thinking behind it, and the layout of the configs. If you like the thinking behind the software you are likely
to find it easy to learn - and this is critical to take up rates. Development costs are mostly a matter of site complexity and it is not exactly easy to get a complex website going
using more established tools and methods. It has not escaped our attention that there are sizable teams on long term contracts working on a single website, using such technology as
node.js. There has to be a message in that somewhere!
More tangibly, Dissemino is pretty technically ambitious for the money. The software itself is free but servers have running costs. It stands to reason that the more efficient the
software is at handling requests, the more traffic each physical server can handle and this lowers hosting costs. For most commercial operators however, a busy site both justifies
and facilitate a large budget so why rock the boat with technology yet to establish a track record when you don't have to? There is a simple answer to that question - there is no
such thing as too much server capacity. Or are people suggesting that there is?