Web Content Extractor Documentation

Starting URLs

Starting URLs are the URLs of the pages the program will start crawling and extracting data from. There are three ways to specify starting URLs:

  1. You can paste the URLs from the clipboard(ctrl+v).
  2. You can type the URLs (one URL per line) or paste it (ctrl+v), click the "Edit..." button.
  3. You can generate URLs automatically. Click the "Edit..." button and then click the "Generate..." button, the URL Generator window appears.
  4. You can navigate to the target page, click the "Browse..." button and use the built-in browser.

If the start page contains a search form that you have to fill out to send the request to get certain data, you should enable the "Fill search form" option, enter the page URL and click the "Edit User Actions..." button. This button brings up the User Input Values window. This window enables you to specify the search form data.

If you need to log into your account by entering your username and password in order to access the data, you should enable the "Fill login form" option, enter the page URL and click the "Edit User Actions..." button. This button brings up the User Input Values window. This window enables you to specify the login form data.

If the URL of the start page is dynamic, you should enable the "Navigate to the starting page manually." option. When you start the project for the first time, the program will allow you to use the built-in browser to open any page, detect its address and start crawling from this page.