Facebook Twitter Telegram

SiteAnalyzer Documentation

Detailed description of the SiteAnalyzer

Purpose of the program

SiteAnalyzer program is designed to analyze the site and to identify technical errors (search for broken links, duplicate pages, incorrect server responses), as well as errors and omissions in the SEO-optimization (blank meta tags, excess or complete lack of headers h1 pages, page content analysis, relink quality and a variety of other SEO-parameters). In total, more than 60 parameters are analyzed.

SiteAnalyzer, site auditor

Key features

  • Scanning of all pages of the site, as well as images, scripts and documents
  • Getting server response codes for each page of the site (200, 301, 302, 404, 500, 503, etc.)
  • Determining the presence and content of Title, Keywords, Description, H1-H6
  • Find and display "duplicate" pages, meta tags and headers
  • Determining the presence of the attribute rel="canonical" for each page of the site
  • Following the directives of the file "robots.txt", the meta tag "robots", or X-Robots-Tag
  • Accounting "noindex" and "nofollow" when crawling the pages of the site
  • Reference analysis: the definition of internal and external links for any page of the site
  • Calculation of internal PageRank for each page of the site
  • Determining the number of redirects from the page
  • Scanning an arbitrary external URL and Sitemap.xml
  • Sitemap "sitemap.xml" generation (with the possibility of splitting into several files)
  • Filtering data by any parameter (flexible configuration of filters of any complexity)
  • Export reports to CSV, Excel and PDF

Differences from analogues

  • Low demands on computer resources, low consumption of RAM
  • Scanning websites of any volumes due to the low requirements of computer resources
  • Portable format (works without installation on a PC or directly from removable devices)

Documentation Index

Beginning of work

When the program is launched, the user can access the address bar for entering the URL of the analyzed site (you can enter any page of the site, since the crawler, following the links of the source page, will bypass the entire site, including the main page, provided that all links are in HTML and not use Javascript).

After clicking the "Start" button, the crawler starts re-passing all the pages of the site via internal links (it does not go to external resources, it also does not go through links performed on Javascript).

After the robot has bypassed all the pages of the site, a report is made available in the form of a table and displays the received data, grouped by thematic tabs.

All analyzed projects are displayed in the left part of the program and are automatically saved in the program database together with the received data. To delete unnecessary sites, use the context menu of the project list.

Note:

  • when you click on the "Pause" button, the project scan is paused. In parallel, current scan progress is saved to the database, which allows, for example, close the program and continue scanning the project from the stop point after restarting the program
  • the "Stop" button interrupts the scan of the current project without the possibility of continuing to scan it

Program settings

The section of the main menu "Settings" is intended for fine settings of the program with external sites and contains 7 tabs:

SiteAnalyzer, program settings

Main settings

The main settings section serves for specifying the user-defined directives used when scanning the site.

Description of the parameters:

  • Number of threads
    • The higher the number of threads, the more URLs can be processed per unit of time. It should be taken into account that a larger number of threads leads to a greater number of used PC resources. It is recommended to set the number of threads in the range of 5-10.
  • Scan Time
    • It sets the time limit for scanning a site. It is measured in hours.
  • Maximum depth
    • This parameter is used to specify the depth of the site's scanning. The home page has an nesting level of 0. For example, if you want to crawl the pages of a site like "somedomain.ru/catalog.html" and "somedomain.ru/catalog/tovar.html", then you need to set the maximum depth = 2.
  • Delay between requests
    • It is used to set pauses when the crawler calls to the pages of the site. This is necessary for sites on "weak" hosting, not withstanding heavy loads and frequent access to them.
  • Query Timeout
    • Setting the time to wait for a site to respond to a program request. If some of the pages of the site respond slowly (long loads), then the site scan can take quite a long time. Such pages can be cut off by specifying the value after which the scanner will go to the scanning of the remaining pages of the site and thus will not delay the overall progress.
  • Maximum crawled pages
    • Limitation on the maximum number of pages crawled. It is useful if, for example, you need to scan the first X pages of a site (images, style CSS, scripts and other types of files are not taken into scan).

Scanning Rules

SiteAnalyzer, content types

Content types

  • In this section, you can select the types of data that the parser will take into account when crawling pages (images, videos, styles, scripts) or exclude unnecessary information when parsing.

Scanning Rules

  • This settings are related to exclusion settings when crawling the site using the "robots.txt" file, "nofollow" links, and using the "meta name='robots'" directives in the site page code.

SEO

SiteAnalyzer, SEO settings

This section serves to specify the main SEO-parameters being analyzed, which will be checked for correctness in the future when parsing pages, after which the statistics obtained will be displayed on the SEO statistics tab in the right part of the main program window.

Yandex XML

With the help of these settings, you can select a service through which you will check the indexation of pages in the Yandex search system. There are two options for checking indexing: using the Yandex XML service or the Majento.ru service.

SiteAnalyzer, Yandex XML options

When choosing the Yandex XML service, you need to take into account possible restrictions (hourly or daily) that can be applied when checking the indexing of pages, regarding the existing limits on your Yandex account, as a result of which situations can often arise when your account’s limits are not enough for checking all pages at once and you have to wait for the next hour.

When using the Majento.ru service, hourly or daily restrictions are practically absent, since your limit literally merges into the general pool of limits, which is not small in itself, and also has a significantly larger limit with hourly restrictions than any of the individual user accounts on "Yandex XML".

SiteAnalyzer, check indexing pages in Yandex

User-Agent

In the User-Agent section, you can specify which user-agent will be presented to the program when accessing external sites during their scanning. By default, a custom user agent is installed, however, if necessary, you can select one of the standard agents most commonly found on the Internet. Among them there are such: search engine bots YandexBot, GoogleBot, MicrosoftEdge, bots of Chrome browsers, Firefox, IE8, and also mobile devices iPhone, Android and many others.

Proxy-server

If there is a need to work through a proxy, then in this section you can add a list of proxy servers through which the program will access external resources. Additionally, it is possible to check the proxy for performance, as well as the function of removing inactive proxy servers.

SiteAnalyzer, Proxy settings

Exclude URLs

This section is designed to avoid crawling certain pages and sections of the site when parsing.

Using the search patterns * and ? you can specify which sections of the site should not be crawled by the crawler and, accordingly, should not be included in the program database. This list is a local list of exceptions for the time of site scanning (relative to it, the "global" list is the file "robots.txt" in the root of the site).

SiteAnalyzer, Exclude URLs

Include URLs

Similarly, allows you to add URLs that must be crawled. In this case, all other URLs outside of these folders will be ignored during the scan. This option also works with search patterns * and ?

SiteAnalyzer, Include URLs

PageRank

Using the PageRank parameter, you can analyze the navigation structure of your websites, as well as optimize the system of internal links of a web resource for transmitting reference weight to the most important pages.

SiteAnalyzer, PageRank settings

The program has two options for calculating PageRank: the classical algorithm and its more modern counterpart. In general, for the analysis of internal linking of the site there is not much difference when using the first or second algorithms, so you can use any of the two algorithms.

A detailed description of the algorithm and the principles of calculating PageRank can be found in this article: calculation of internal PageRank.

Working with the program

After the scan is completed, the information in the "Master data" block becomes available to the user. Each tab contains data grouped with respect to their names (for example, the "Title" tab contains the contents of the page title <title></title>, the "Images" tab contains a list of all images of the site and so on). Using this data, you can analyze the content of the site, find "broken" links or incorrectly filled meta tags.

SiteAnalyzer, site auditor

SiteAnalyzer, find 404 errors

If necessary (for example, after making changes on the site), using the context menu, it is possible to rescan individual URLs to display changes in the program.

Using the same menu, you can display duplicate pages by the corresponding parameters (duplicate title, description, keywords, h1, h2, content of pages).

SiteAnalyzer, context menu

Data filtering

For a more convenient analysis of site statistics in the program, data filtering is available. Filtration is possible in two variants:

  • for any fields using the "quick" filter
  • using a custom filter (using advanced data sampling settings)

Quick filter

Used to quickly filter data and apply it simultaneously to all fields in the current tab.

SiteAnalyzer, Quick filtration

Custom filter

Designed for detailed filtering and can contain multiple conditions at the same time. For example, for the "title" meta tag, you want to filter pages by their length so that it doesn't exceed 70 characters and contains the "news" text at the same time. Then this filter will look like this:

SiteAnalyzer, Detailed filtering

Example of a sample on the filter:

SiteAnalyzer, Sample filter example

Thus, applying a custom filter to any of the tabs you can get data samples of any complexity.

Technical statistics

The site’s technical statistics tab is located on the Additional Data panel and contains a set of basic site technical parameters: statistics on links, meta tags, page response codes, page indexing parameters, content types, etc.

Clicking on one of the parameters, they are automatically filtered in the corresponding tab of the site master data, and at the same time statistics are displayed on the diagram at the bottom of the page.

SiteAnalyzer, Technical statistics

SEO-statistics

The SEO-statistics tab is intended for conducting full-fledged site audits and contains 50+ main SEO parameters and identifies over 60 key internal optimization errors! Error mapping is divided into groups, which, in turn, contain sets of analyzed parameters and filters that detect errors on the site.

A detailed description of all the checked parameters is available in this article: SiteAnalyzer 1.8 review.

SiteAnalyzer, SEO-statistics

For all filtering results, it is possible to quickly export them to Excel without additional dialogues (the report is saved in the program folder).

Site structure

This functionality is designed to create the structure of the site based on the parsed data. The structure of the site is generated based on the nesting of the URL pages. After the structure is generated, its export to CSV-format (Excel) is available.

SiteAnalyzer, site structure

Project list context menu

  • In the list of projects, a mass scan is available by selecting the desired sites and clicking the "Rescan" button. After that all the sites are queued and scanned one by one in standard mode.
  • Also, for the convenience of working with the program, mass removal of selected sites is also available by clicking on the "Delete" button.
  • In addition to a single scan of sites, there is the possibility of mass adding sites to the list of projects using a special form, after which the user can scan the whole projects of interest.

SiteAnalyzer, group site addition

Sitemap.xml generation

The site map is generated based on the crawled pages of the site. It adds pages of the "text/html" format.

You can generate a Sitemap immediately after scanning the site, via the main menu: "Projects -> Generate Sitemap".

SiteAnalyzer, sitemap generation

For sites of large volumes, from 50 000 pages, there is the function of automatically splitting "sitemap.xml" into several files (in this case the main file contains links to additional files containing direct links to the pages of the site). This is due to the requirements of search engines for processing large sitemap files.

SiteAnalyzer, sitemap.xml

If necessary, the amount of pages in the file "sitemap.xml" can be varied by changing the value of 50 000 (it is set by default) to the desired value in the main settings of the program.

Scan arbitrary URLs

The menu item "Import URL" is intended for scanning arbitrary lists of URLs, as well as XML maps of the Sitemap.xml site (including index sitemaps) for subsequent analysis.

SiteAnalyzer, Menu - Import URL

Scanning arbitrary URLs is possible in three ways:

  • by pasting a list of URLs from the clipboard
  • loading from the hard disk files of the *.txt and *.xml formats containing URL lists
  • by downloading the Sitemap.xml file directly from the site
SiteAnalyzer, Scanning a list of arbitrary URLs via the Clipboard

SiteAnalyzer, Scanning Sitemap.xml by URL

A feature of this mode is that when scanning arbitrary URLs, the "project" itself is not saved in the program and the data on it is not added to the database. Also, the sections "Site Structure" and "Dashboard" are not available.

More information about the work of the item "Import URL" can be in this article: обзор SiteAnalyzer версии 1.9.

Dashboard

Dashboard tab displays a detailed report on the current site optimization quality. The report is generated based on the SEO Statistics tab. In addition to these data, the report contains an indication of the overall quality indicator of site optimization, calculated on a 100-point scale relative to the current degree of optimization. You can export data from the "Dashboard" tab in a handy report in PDF format.

SiteAnalyzer, Dashboard

Data export

For a more flexible analysis of the received data, it is possible to upload them to the CSV format (the current active tab is exported), as well as generate a full-fledged report in Microsoft Excel with all the tabs in one file.

SiteAnalyzer, Excel data export

When exporting data to Excel, a special window is displayed in which the user can select the columns of interest and then generate the report with the required data.

SiteAnalyzer, Excel report

Multilanguage support

In the program there is a choice of the preferred language on which the work will be done.

Main supported languages: English, German, Italian, Spanish, French, Russian... At the moment the program is translated into more than fifteen (15) most popular languages.

SiteAnalyzer, Multilanguage support

If you want to translate the program into your own language, then it is enough to translate any "*.lng" file into the language of interest, after which the translated file should be sent to the address "support@site-analyzer.pro" (comments to the letter should be written in Russian or English) and your translation will be included in the new release of the program.

More detailed instructions on how to translate the program into languages are found in the distribution (file "lcids.txt").

P.S. If you have any comments on the quality of the translation – send comments and corrections to "support@site-analyzer.pro".

Compress Database

The main menu item "Compress Database" is designed to perform the operation of packing the database (cleaning the database from previously deleted projects, as well as ordering data (analogous to defragmenting data on personal computers)).

This procedure is effective when, for example, a large project containing a large number of records has been deleted from the program. In general, it is recommended to periodically compress data to get rid of redundant data and reduce the size of the database.

The answers to the remaining questions can be found in the FAQ section.

Our clients