Newprosoft

web scraping software

Visual Web Spider

Web spider, website crawler, URL extractor

Visual Web Spider is a multithreaded web crawler, website downloader and website indexer. It allows you to crawl websites and save webpages, images, pdf files to your hard disk automatically. It can extract text from HTML code between specific HTML tags and save it to a local database. Visual Web Spider enables you to index pages that contain specific keywords and phrases. Also, it can help you to find the broken links in your website. This program’s friendly, wizard-driven interface lets you customize the crawler in a step-by-step manner. No special knowledge or skills are required to get started with this crawler.

Let’s say, for example, you are a shareware developer interested in internet marketing. You need to find and save all webpages at www.marketingexperiments.com that contain such keywords as “Google Adwords” and “PPC marketing”. Or you need to crawl all pages of the website and download document files (pdf, doc, xls) or audio files (mp3, wma) or video files (mpeg, avi) to your computer's hard drive. Or you may want to collect website links to build your own specialized web directory. You can configure Visual Web Spider to automatically do this for you.

To index relevant web pages, just follow this simple sequence of steps. After you open the wizard, enter the starting web page URL. Or let the program generate URL links based on specific keywords or phrases. Then set the crawling rules and depth according to your search strategy. Finally, specify the data you want to index and your project filename. That’s pretty much it. Clicking on ‘Start’ sets the crawler to work. Crawling is fast, thanks to multithreading that allows up to 20 simultaneous threads.

Another nice touch is that Visual Web Spider can extract the text between specific HTML tags such as: page title (TITLE tag), page text (BODY tag), HTML code (HTML tag), header text (H1-H6 tags), bold text (B tags), anchor text (A tags), alt text (IMG tag, ALT attribute), keywords, description (META tags) and others. This program can also list each page size and last modified date.

Once the data has been extracted, Visual Web Spider can export it to any of the following formats: Microsoft Access, Excel (CSV), TXT, HTML, and MySQL script. This variety of allowable export formats lets you to process and analyze data in a format convenient for you.

Many search engines use web robots to gather web pages for indexing. They utilize this technology to update their content on a regular basis. Researchers use it to widen their perspective of the internet and its vast collection of data. Now you can have your own web crawler and use it to be more productive in your home or office.

Features

  • A Personal, Customizable Web crawler. Crawling rules. Multithreaded technology (up to 20 threads). Support for the robots exclusion protocol/standard (Robots.txt file and Robots META tags);
  • Exract text between specific HTML tags.
  • Export the extracted data into Microsoft Access database, TEXT file, Excel file (CSV), HTML file, MySQL script file;
  • Start crawling from a list of the URLs specified by user;
  • Start crawling using keywords and phrases;
  • Store web pages and media files on your local disk;
  • Resolve URL of redirected links, get real/final URL;
  • Detect broken links;
  • Filter the extracted data;
  • Command line options;
  • Generate and export map of the visited links;
  • Very simple to use, quick learning curve and right to the point.

Screenshots

Screenshot Visual Web Spider