2: EPISTULA DATA OBJECT MODEL
A summary of what Epistula stores, how and in respect of what, is as follows:-
Epistula stores the following data in respect of each incoming and outgoing message:-
1) A single copy of the whole form message, as a binary object. The means of storage is a hdbBinCron (a binary datum repository, described in the HadronZoo Library Overview). The hdbBinCron issues 32-bit datum ids. These ids are known as the internal Epistula message ids.
2) The message short form is stored in a hdbIsamfile known as the 'Core ISAM' which is in reverse chronological order and grouped by user then by correspondent address. There will be a separate insertion of the short form message, for the sender if local and another for each local recipient.
3) The formal message id is stored in another hdbIsamfile. This is encoded as a direct mapping of formal message ids to internal message ids.
Epistula also stores data in respect of the whole body of incoming and outgoing messages:-
- A 1:many mapping of all users to all correspondents. In effect, the complete set of automatic folders.
- A 1:many mapping of users to memory resident short form messages, known as the 'Global Inbox Repository'. See note (x1)
Messages are said to be 'delivered' to a mailbox. The delivery is largely virtual. As the core ISAM is grouped by user, all the short form messages in the mailbox are exactly those in the section of the core ISAM given to the user. Other aspects of the user account, do actually hold data as follows:-
- An export of the user defined folders?
- POP3 Repository. This is simply a file of 32-bit message ids that have already been collected by a POP3 client.
The administrative entities are as follow:-
1) Local domains. These are domains for which the physical server is listed as mail server (by its IP address) in the DNS records.
2) User accounts. For authentication, users or applications which act as users, are required to have accounts. Each account can be associated with multiple local addresses but must be associated with at least one.
3) Local addresses. Incoming messages are not accepted unless at least one of the stated recipient addresses is a local address. Nor is message origination allowed unless the stated sender is a local address. The list is a single entity. It is not a list per local domain, even though each address listed, must be of a local domain.
4) Groups/Departments. As aforementioned, these consist of a user account (the group/department administrator), a single email address and a list of users (members).
The webmail folder navtree is a JavaScript array, exported from a hdsTree instance (Dissemino tree, described in the Library manual). Dissemino web application navtrees are usually linked to a user session, and are memory resident only for the session duration. In Epitula the hdsTree is omnipresent in order to speed up logins. Folder populations will broadly match the number of correspondents the applicable user has. Research suggests a 'stable average' of around 10,000 per user. With 250 users this will be 2.5 million folders.
2: SERVER CAPACITY and PEAK LOADS
Although email systems can be attacked, email is not prone to going 'viral' in the same way that websites are. Certainly a viral website can lead to a message influx, but the surge will not be comparable. Going viral is mainly a social media thing. People will look, people will message their friends, but unless people have anything to add, you probably won't get a message. So it is sensible in most cases, particularly for larger organizations, to think in terms of so many messages, or so many gigabytes per day, week or month.
Smaller organizations will see greater statistical variance but with an entry level server, there will usually be at least a terabyte of disk space. Messages, particularly the HTML part, can be verbose and then there are the attachments. Average message size can be 100Kb and in some cases, 200Kb. You should have time to work out what metrics you need to apply in your own case but in general, a terabyte will be good for some 5 or 6 million messages, about 100 user years. With any reasonable retention/deletion policy, you can expect a big improvement on this number.
2: Peak Loads
Using the average user metric, with 250 users we have ~60,000 messages per day and if most of this is occuring during office hours, we have a daytime average of ~8,000 per hour. If peak loads are say, four to five times the daytime average, total messages in and out can reach 10 per second. Maybe we should call it 20 just in case. This covers the data ingress but on top of this, messages are being read. The worse scenario is where messages are 'flipped through', rather then ignored. Organizations like their staff to be punctual so let's assume they are a big success with that and everyone is at their desk on the dot at 9am. As some messages will have come in overnight they all have lots of reading to do. Many will quickly run out but during the short time they are all flipping through their messages, the data egres rate can easiliy reach another 30 messages per second.
An obvious point to note is that in general, there will not be many users and so not many user addresses. There will be even fewer local domains (often just one), nor will there be many groups or departments.