Formats of Domain names, Email Addresses and URLs
In HadronZoo parlance, domain names, email addresses and URLs are collectively termed 'Internet Address Types', since they all address internet based resources. All have a distinct form so are treated as a distinct, fundamental data types. They are respectively given hzBasetype denotations of BASETYPE_DOMAIN, BASETYPE_EMADDR and BASETYPE_URL, and are mediated by the C++ classes of hzDomain, hzEmaddr and hzUrl, which are known as the Intenet address classes.
Domain names consist of one or more labels that are concatenated and delimited by periods. Each label is a series of alphanumeric characters that may contain hyphens (-), providing these are not consecutive, at the beginning or the end. Lables are limited to 63 bytes including hyphens; Domain names are limited to 253 bytes total, including the periods. Domain names are hierarchical with the right-most label being the top-level domain, and each label to the left being a sub-domain of the label to the right. Although upper case characters are allowed, domain name lookup is case insensitive so a lot of software converts upper to lower case on imput. Users enter domain names in upper case but on retrieval, only lower case letters appear. HadronZoo adopts this latter approach.
Email addresses have the form of local-part@domain_name where the local-part is a series of between 1 and 63 characters that may include a-z, A-Z, 0-9, symbols of !, #, $, %, &, ', *, +, -, /, =, ?, ^, _, `, {, |, }, ~ and the period on the proviso that it must not be the first or last character or be next to another period. Technically the local part is case sensitive however many organizations do not permit this. The HadronZoo::Epistula mail server does not allow upper case letters in email addresses. hzEmaddr converts upper to lower case on assignment.
URLs comprise a scheme e.g. "http://", a domain name, an optional port specifier e.g. ":8080" and a directory which can take the total length to 2,048 bytes.
Comparison Operators
The internet address classes have more complex comparson operators than hzString. Given the hierarchical structure of domain names and the label ordering, it does not make sense to compare domain names from left to right as though they were strings. Nor does it make sense to compare whole labels from right to left. The important part of 'www.hadronzoo.com' is 'hadronzoo', not the top level domain of 'com' or the sub-domain of 'www'. So domain names are compared on the first sub-domain, then on the top-level, then on the remaining labels from right to left. Similarly, email addresses are compared first on domain then on local-part. URLs are compared on domain then path.
Note that top level domains with an extra period such as .co.uk are known exceptions and are treated as the top-level domain.
Allocation Regime Factors
The internet address data types are special cases of string and the internet address classes can avail their value as a string (Cstr or hzString), and be set by a string, providing the value is the correct format. An obvious difference however, is that the internet address classes are whole and immutable. Once a value the set, it does not make sense to modify it in any way. Consequently hzDomain, hzEmaddr and hzUrl, have no operators that would alter the value in-situ. Arguably a URL could be extended but currently hzUrl does not allow this. Another difference is that the internet address data types have definite length limits. In theory, hzDomain and hzEmaddr could exploit thier limited alphabets to shorten the value but this is not done currently. In any case, domain names are usually short so humans can remember them and email addresses are likewise. In additition, hzEmaddr replaces the domain name component with the 32-bit address allocated by the domain allocation regime, thereby shortening the email address further. Only hzUrl which also uses the 32-bit domain address, can reach lengths that would challenge an allocation regime in any way - and it piggy backs off of hzString. In common with hzString, the internet address classes support conversion to Cstr but not directly. The conversion functions assemble the Cstr in the threadwise scratch pad.
Domain and Email Address Registry
The HadronZoo library provides a domain registry and an email address registry for the same reason it provides a string resistry - to avoid repetition in RAM Primacy stores. Domain names and email addresses however, are regarded as having particular significance to webapps. They are real life, verifiable entities, pivotal to security among much else. Because of this, there is an expanded role for the two registries, in that they note if the domain/address has been validated.
note also that hzEmaddr and hzUrl use the address of the domain string space to represent the domain part, placing an email or a URL into a registry, also makes sure the domain is registered.