A penetration tester is in the reconnaissance phase of an engagement targeting a large, multi-page web application. The tester hypothesizes that developers may have left sensitive information, such as internal hostnames or API endpoints, within HTML comments or disabled form fields. Which technique is the most effective and scalable for discovering this type of information across the entire application?
Use an automated script or tool to crawl the site and parse the raw HTML of every page.
Review historical snapshots of the site using an internet archive service.
Perform a reverse DNS lookup on the web server's IP address.
Open each page in a browser and manually inspect the source code.
For a large, multi-page application, manually browsing each page is inefficient and prone to error, as browsers do not render comments or some disabled elements. A specialized program or script can automate the retrieval (crawling) and parsing of the raw HTML source code from every page, systematically and efficiently identifying comments, hidden fields, and other metadata. While historical platforms might reveal past information, they do not reflect the current live application. DNS records provide infrastructure information, not page-level content.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does 'HTML comments' mean?
Open an interactive chat with Bash
What is a site crawler, and how does it work?
Open an interactive chat with Bash
Why is performing a reverse DNS lookup ineffective for finding sensitive information in web applications?