Find all links on a page lying within a set of acceptable domains and matching any supplied criteria. These are aggregated to the supplied vector of link URLs. If no domains or criteria are supplied, all the links in the page will be aggregated. Note the links in a page are established in the Load() function. This function meerly filters them. It does not read the page content.
| Return Type | Function name | Arguments |
|---|---|---|
| uint32_t | hzDocHtml::ExtractLinksBasic | (hzVect<hzUrl>&,hzSet<hzString>&,hzString&,) |
Declared in file: hzDocument.h
Defined in file : hzDocHtml.cpp
Function Logic:
Function body:
uint32_t hzDocHtml::ExtractLinksBasic (hzVect<hzUrl>& links)hzSet<hzString>& domains, hzString& form,
{
// Find all links on a page lying within a set of acceptable domains and matching any supplied criteria. These are aggregated to the supplied vector of link
// URLs. If no domains or criteria are supplied, all the links in the page will be aggregated.
//
// Note the links in a page are established in the Load() function. This function meerly filters them. It does not read the page content.
//
// Arguments: 1) links: The vector or set of URLs (links) found in the document
// 2) domains: The set of domains that links must belong to in order to be included
// 3) form: The search criteria is any
//
// Returns: Number of links that meet the supplied criteria
hzUrl link ; // URL of link
uint32_t nIndex ; // Links iterator
links.Clear() ;
for (nIndex = 0; nIndex < m_vecLinks.Count() ; nIndex++)
{
link = m_vecLinks[nIndex] ;
// Ignore empty links (should not be any)
if (!link)
continue ;
// Ignore links to domains not on the list of acceptable domains (usually the website domain only)
if (domains.Count())
{
if (!domains.Exists(link.Domain()))
continue ;
}
// Now apply criteria
if (form)
{
if (!FormCheckCstr(*link, *form))
continue ;
}
links.Add(link) ;
}
return links.Count() ;
}