Purpose of the program
SiteAnalyzer program is designed to analyze the site and to identify technical errors (search for broken links, duplicate pages, incorrect server responses), as well as errors and omissions in the SEO-optimization (blank meta tags, excess or complete lack of headers h1 pages, page content analysis, relink quality and a variety of other SEO-parameters).
- Scanning of all pages of the site, as well as images, scripts and documents
- Obtaining server response codes for each page (200, 301, 302, 404, 500, 503, etc.)
- Determining the presence and content Title, Keywords, Description, H1-H6
- Search and display the "duplicate" pages, meta tags and headers
- Determining whether an attribute rel="canonical" for each page of the site
- Following the directives of "robots.txt" file or meta tag "robots"
- Respect rel="nofollow" when crawling pages on your site
- Reference analysis – finding internal and external links to pages (within the site)
- Determination of the number of referrals from the page (redirect)
- Determining the level of nesting pages relative to the main
- Generate sitemap "sitemap.xml" (with the possibility of splitting into several files)
- URL filtering by any parameter
- Export reports to CSV and Excel (full report in Excel-format)
Differences from analogues
- Low demands on computer resources, low consumption of RAM
- To store data, a local database is used that is characterized by its performance and reliability
- Scanning websites of any volumes due to the low requirements of computer resources
- Portable format (works without installation on a PC or directly from removable media)
Beginning of work
After the robot has bypassed all the pages of the site, a report is made available in the form of a table and displays the received data, grouped by thematic tabs.
All analyzed projects are displayed in the left part of the program and are automatically saved in the program database together with the received data. To delete unnecessary sites, use the context menu of the project list.
The section of the main menu "Settings" is intended for fine settings of the program with external sites and contains 5 tabs:
- Main settings
- Scanning Rules
The main settings section serves for specifying the user-defined directives used when scanning the site.
Description of the parameters:
- Number of threads
- The higher the number of threads, the more URLs can be processed per unit of time. It should be taken into account that a larger number of threads leads to a greater number of used PC resources. It is recommended to set the number of threads in the range of 5-10.
- Scan Time
- It sets the time limit for scanning a site. It is measured in hours.
- Maximum depth
- This parameter is used to specify the depth of the site's scanning. The home page has an nesting level of 0. For example, if you want to crawl the pages of a site like "somedomain.ru/catalog.html" and "somedomain.ru/catalog/tovar.html", then you need to set the maximum depth = 2.
- Delay between requests
- It is used to set pauses when the crawler calls to the pages of the site. This is necessary for sites on "weak" hosting, not withstanding heavy loads and frequent access to them.
- Query Timeout
- Setting the time to wait for a site to respond to a program request. If some of the pages of the site respond slowly (long loads), then the site scan can take quite a long time. Such pages can be cut off by specifying the value after which the scanner will go to the scanning of the remaining pages of the site and thus will not delay the overall progress.
- Maximum crawled pages
- Limitation on the maximum number of pages crawled. It is useful if, for example, you need to scan the first X pages of a site (images, style CSS, scripts and other types of files are not taken into scan).
- In this section, you can select the types of data that the parser will take into account when crawling pages (images, videos, styles, scripts) or exclude unnecessary information when parsing.
- This settings are related to exclusion settings when crawling the site using the "robots.txt" file, "nofollow" links, and using the "meta name='robots'" directives in the site page code.
In the User-Agent section, you can specify which user-agent will be presented to the program when accessing external sites during their scanning. By default, a custom user agent is installed, however, if necessary, you can select one of the standard agents most commonly found on the Internet. Among them there are such: search engine bots YandexBot, GoogleBot, MicrosoftEdge, bots of Chrome browsers, Firefox, IE8, and also mobile devices iPhone, Android and many others.
If there is a need to work through a proxy, in this section you can specify the proxy server settings through which the program will access external resources.
This section is designed to avoid crawling certain pages and sections of the site when parsing.
Using regular expressions, you can specify which sections of the site should not be crawled and, accordingly, should not get into the program database. This list is a local list of exceptions for the time of site scanning (relative to it, the "global" list is the file "robots.txt" in the root of the site).
Working with the program
After the scan is completed, the information in the "Master data" block becomes available to the user. Each tab contains data grouped with respect to their names (for example, the "Title" tab contains the contents of the page title <title></title>, the "Images" tab contains a list of all images of the site and so on). Using this data, you can analyze the content of the site, find "broken" links or incorrectly filled meta tags.
If necessary (for example, after making changes on the site), using the context menu, it is possible to rescan individual URLs to display changes in the program.
Using the same menu, you can display duplicate pages by the corresponding parameters (duplicate title, description, keywords, h1, h2, content of pages).
Project list context menu
- In the list of projects, a mass scan is available by selecting the desired sites and clicking the "Rescan" button. After that all the sites are queued and scanned one by one in standard mode.
- Also, for the convenience of working with the program, mass removal of selected sites is also available by clicking on the "Delete" button.
- In addition to a single scan of sites, there is the possibility of mass adding sites to the list of projects using a special form, after which the user can scan the whole projects of interest.
The site map is generated based on the crawled pages of the site. It adds pages of the "text/html" format.
For sites of large volumes, from 50 000 pages, there is the function of automatically splitting "sitemap.xml" into several files (in this case the main file contains links to additional files containing direct links to the pages of the site). This is due to the requirements of search engines for processing large sitemap files.
If necessary, the amount of pages in the file "sitemap.xml" can be varied by changing the value of 50 000 (it is set by default) to the desired value in the main settings of the program.
For a more flexible analysis of the received data, it is possible to upload them to the CSV format (the current active tab is exported), as well as generate a full-fledged report in Microsoft Excel with all the tabs in one file.
When exporting data to Excel, a special window is displayed in which the user can select the columns of interest and then generate the report with the required data.
In the program there is a choice of the preferred language on which the work will be done.
Main supported languages: English, German, Italian, Spanish, French, Russian... At the moment the program is translated into more than fifteen (15) most popular languages.
If you want to translate the program into your own language, then it is enough to translate any "*.lng" file into the language of interest, after which the translated file should be sent to the address "firstname.lastname@example.org" (comments to the letter should be written in Russian or English) and your translation will be included in the new release of the program.
More detailed instructions on how to translate the program into languages are found in the distribution (file "lcids.txt").
P.S. If you have any comments on the quality of the translation – send comments and corrections to "email@example.com".
The main menu item "Compress Database" is designed to perform the operation of packing the database (cleaning the database from previously deleted projects, as well as ordering data (analogous to defragmenting data on personal computers)).
This procedure is effective when, for example, a large project containing a large number of records has been deleted from the program. In general, it is recommended to periodically compress data to get rid of redundant data and reduce the size of the database.
The answers to the remaining questions can be found in the FAQ section.