Dissemino Manual

Dissemino Web Engine Manual

Dissemino Web Engine: What it is and what it does

The Dissemino Web Engine (HadronZoo::Dissemino), is an open source Linux/C++ web server, developed by HadronZoo for the purpose of hosting websites, or webapps to use the preferred term, that worked the way HadronZoo wanted them to work. It is based on the HadronZoo class library, uses the non-SQL HadronZoo database (HDB), and rigidly adheres to the HadronZoo doctrine of implimenting all functionality in C++. Server side scripts in PHP, Python, Perl or other scripting language, are effectively banned. There is no formal means of calling them. Two reasons for this approach are as the HadronZoo homepage suggests, performance and security. Scripts are much slower than C++ function calls, and HadronZoo has a policy of not running server programs for which it does not have full control of the source code. It is also a matter of taste. As a real time 'bare metal' programming enterprise, HadronZoo regards C++ as completely intuative. It has the opposite perception of scripting languages.

This contrarian stance does not mean webapps are coded in C++. Webapp configs are XML throughout, and have a structure web developers will generally be familiar with. A mix of tags are used: HDB tags define data classes that capture real life objects, declare repositories for said objects and direct data operations; Dissemino tags specify the webapp pages and other resources. The HTML for these resources is as per the HTML5 standard but as it is encapsulated within XML, tag names must match on case and all tags must be closed. Values of data object members and other variables such as the current time and date, are availed by means of a simple percent entity notation. (see article 3.1: "Percent Entities"). Percent entities may appear anywhere within the value part of HTML tags for the purpose of display, or within blocks of HDB tags that direct data operations.

The HadronZoo library has two class families of interest here. The HDB classes which embue programs with the HadronZoo database, and the Dissemino classes which embue programs with a HTTP interface. The HDB tags in Webapp configs each create or manipulate a HDB class instance, such as a data class, a repository or a single data object. The Dissemino tags each create or manipulate a Dissemino class instance, such as a webpage, a server-side include block, or a cookie. The set of HDB and Dissemino tags and their associated C++ classes and functions in the library; the means by which these functions are triggered; the encapsulation of HTML5 within the XML configs; and the use of percent entities to lookup and display values; together amount to what is known as the Dissemino method. The method design arose from a study of the LAMP method (Linux, Apache, MySQL and PHP/Python/Perl). In effect, the method is LAMP, but with everything except the Linux OS, replaced.

In the general case, a webapp will have real life objects that can be defined using the standard set of data types, and will do little more than store and retrieve such objects. If so, C++ skills are not required. However, where it is necessary to extend the set of Dissemino tags to cope with specialist data peculiar to particular disciplines, profficiency in C++ will be required to write the C++ classes and functions such tags will trigger.

Note that a Dissemino webapp has essentially the same definition as a website, i.e. it is a collection of web pages and related content, identified by a common domain name. The two terms can be interchangable, but there are subtle differences. While a website would usually provide all the content available at the domain, a webapp might not do so. Websites can be constructed as one or more webapps, for example, with one webapp providing passive information pages while another provides the functionality of a member's only area.

Why the HDB? Why No SQL?

The HDB design was informed by a study of numerous live websites with noteable data functionality. The Dissemino Observations as the study was known, found that of all the apparent functionality observed in the websites, none of it needed SQL. What the observations instead concluded, was that websites traded in whole real life objects, and needed repositories that would store and retrieve them whole. Matters, at least with websites as accessed by users, are more straightforward this way. With queries constrained by search forms, most if not all queries will apply to a single repository, with the result being a list of zero or more data objects from the same repository. In the usual case, in response to hitting the search button, users are presented with a memu of objects if more than one is found, the object itself if only one is found, or a 0 records found message.

With the HDB, developers first define data classes (specify data structures) for the real life objects, then use the data classes to create repositories that will hold the objects. Where a data class is hierarchical, i.e. has members that are of another data (sub)class, there is a choice. Subclass objects can be held in the parent class repository, along with the parent class objects, or in their own repository if they are useful in thier own right. Either way, in INSERT and FETCH operations on the parent class repository, there are no explicit joins as objects are assembled and disassembled automatically. With a standard SQL relational database, subclass tables would be needed in all cases, as would joins. As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be basic, but it should not be necessary to alter the executive commands which make the forms operate.

This is not to say that there won't be difficult exceptions, back end reports being a case in point. Instead of humans looking up and modifying data objects specific to themselves, or a bot pretending to do the same, the objective is usually to mine the entire database for market trends. The are no constraining forms so multi-repository lookups and joins, are all but inevitable. Joins are never implicit with the HDB, so reqire explicit loops. There is a case for SQL here, but not a compelling one. With or without the flexibility of SQL, it pays to consider the mining and report build process. With this approach, explicit loops are not so strange and of course, are very easy to impliment in C++.

Aside from the merits of a hierarchical data model, HadronZoo did not want the database to be a black box. With prior experience of proprietary database design, taking full control of the database was a natural step. With the HDB part of the HadronZoo library, the HDB classes and functions are intrinsic aspects of the web engine ...

As a development aid, data class definitions automatically generate default forms, form validation JavaScript, and form handlers. The appearence will be basic, but it should not be necessary to alter the executive commands which make the forms operate.

Microservices

Supplied in the HadronZoo download along with HadronZoo::Dissemino (the web engine), is HadronZoo::RepoServer, each instance of which avails a single HDB repository as a completely independent, omnipresent microservice. The repository microservices are external to the web engine, and replace repositories the web engine would otherwise have to host internally. Although microservice use adds steps to data operations and increases latency, there are circumstances where these costs are outweighed. By having common data sets such as verified email addresses, available as a microservice, wasteful duplication is avoided. And without repository microservices, large distributed and/or resilient systems, could not be built.

Repository microservices are easy to create, and easy to direct webapps towards. HadronZoo has standardized its config regime so the RepoServer program uses the exact same HDB tags to define the repository data class, as the web engine does. In webapp configs, repository declarations indicate microservice use by supplying the microservice IP address and port number as attributes. Otherwise by default, the repository will be created as an internal entity. In order to mitigate the extra latency, it is recommended that the machine hosting the microservice is placed in the same data center as that hosting the web engine.

Dissemino Performance

The web engine as part of the HadronZoo download, is free open source software. Because of this, due consideration was given to performance on entry level servers. To this end, the number of threads used to serve HTTP is configurable. The default is 1, which is known as slow mode. In slow mode there will still be one or more background threads, but all client connections and requests are completely handled by the main thread. In fast mode where the number of threads is >1, the main thread accepts HTTP connections and receives requests, but passes them via a lock free queue, to another thread to process and send out the response.

A key performance metric is the number of requests per second. Bench tests on fairly typical business webapps, showed sustained throughput of some 2000 requests per second with the web engine in slow mode. Fast mode with 8 threads (one per core), pushed this to 5000. This is really the limit for an 8-core, 3Ghz server, since further threads had no discernable effect. These are ball park numbers but pretty good ball park numbers. Different requests vary in the time they take, but in predicable ways. The time taken to receive and respond to a request, depends on the volume of data transfered, while the time taken to process the request depends on what data operations are required. Small fixed content pages are the fastest, usually completing within 200 microseconds, as only the HTTP request header is uploaded (~1Kb), only page header and content is downloaded (say ~5Kb), and the only lookup is in the map of URLs to pages, which is memory resident. Larger fixed content pages however, are not slow unless the pipe is. In terms of process time, each additional Kb of data, only adds around 3 microseconds.

Most requests for active resources will complete within 400 microseconds, or can readily be made to by tuning the applicable HDB repositories. The only real outliers are where free text indexation is applied to large document uploads. The process takes some 10 microseconds per word. Free text searches also exceed the 400 microsecond ball park, particularly if they contain dozens of terms.

Dissemino in the 'Market'

Dissemino won't be everyone's cup of tea and being so was never the intention. Not everyone will like the approach to data. Our only concern is that it is a new product and to date has few websites to its name. All the sites we have developed have had a strong utilitarian theme. They are not feature rich so the list of features Dissemino has had to support so far has been limited. Dissemino fully supports HTML5 and places no restrictions on the use of CSS or JavaScript. There are no reasons for thinking there will be issues with feature rich sites but we look forward to seeing examples up and running. Potential Dissemino developers will also be concerned about overall take up of Dissemino. It does not have to be particularly popular but it must have good odds of achieving a userbase large enough for developers to be reasonable easy to come by. So whose cup of tea is it?

It will appeal of course, to those who like the approach to data, the thinking behind it, and the layout of the configs. If you like the thinking behind the software you are likely to find it easy to learn - and this is critical to take up rates. Development costs are mostly a matter of site complexity and it is not exactly easy to get a complex website going using more established tools and methods. It has not escaped our attention that there are sizable teams on long term contracts working on a single website, using such technology as node.js. There has to be a message in that somewhere!

More tangibly, Dissemino is pretty technically ambitious for the money. The software itself is free but servers have running costs. It stands to reason that the more efficient the software is at handling requests, the more traffic each physical server can handle and this lowers hosting costs. For most commercial operators however, a busy site both justifies and facilitate a large budget so why rock the boat with technology yet to establish a track record when you don't have to? There is a simple answer to that question - there is no such thing as too much server capacity. Or are people suggesting that there is?

	Search phrase (can use parenthesis)