It retrieves an HTML document from a Web server. The document can be clean to remove white spaces, NUL and escape characters, Javascript and style section definitions.
The class can parse the HTML document and return an hierarchy of tag objects.
It can also traverse the parsed document hierarchy to extract the keywords contained in it and the respective keyword density values.