No internet connection
  1. Home
  2. Support

Is there a Way to see what files are Crawl'ed?

By @Carlos2018-12-06 17:56:22.665Z

The Crawl-counter goes up to "crawl_site #500" but iam only using 10 Sites an 0 posts nothing in Trash, no Tags. Can i See what files are Processed?

Solved in post #2, click to view
  • 5 replies
  1. Leon Stafford @leonstafford2018-12-06 18:05:08.228Z

    Hi Carlos,

    Good question.

    Yes, in a few ways. Try this latest version of the plugin and check the Logs tab. You can see a few different types of logs, such as the initial crawl list, these are the URLs the plugin detects from WordPress, such as your Post and Archive URLs (which you said were not many). It will also grab media and certain theme files. You can also see this under the Crawl tab, where you can filter out any that you want to skip that you know are not needed to save time/have a cleaner export.

    It also shows Discovered URLs, these are extra URLs linked to from the original crawl list. This helps avoid any missing files in your static export.

    I don't have it in just yet, but will soon add the "Crawled links" to the Logs tab, so you can see even more information about the export and deployment.

    Here's the latest version of the plugin to try:

    It's not an official release yet, still working on a few bugs, but is worth you starting to use and let me know anything that isn't working.



    1. Leon Stafford @leonstafford2018-12-06 18:09:17.403Z

      Also, with the new version, try under the Advanced tab, setting the Crawl Increment to 100, 1000 or maximum. It will not show the numbers moving so quickly as this represents the size of the "batch" of requests that get processed each round, in this case, 100, 1000 or all of the initial crawl list will be crawled on each request from your browser to the server.

      Go as high as your server can handle without the export failing. You should see much improved export times!

      1. Leon Stafford @leonstafford2018-12-06 18:18:25.394Z

        I've just implemented this, thanks for the prompt!

        It isn't in the currently linked build, but next time I update that later today, you should get the Crawled Links option in the Logs tab.

    2. C
      In reply toCarlos:
      @Carlos2018-12-06 18:24:56.171Z

      Thanks for your fast Answer. And your Work on this fine Plugin.
      In the Tap i see a list of a lot of images. But they are currently not used in at the Website.
      It would be helpful to see, how they are chosen. (Incoming Link for example) I used SEO FROG and didn't see on Link to any of the images.

      Thanks anyway

      1. Leon Stafford @leonstafford2018-12-06 18:40:48.453Z

        Yes, I'll be making at least some of the methods it uses to detect URLs optional via checkboxes or such. Maybe not for this release, but soon. The more control the user has the better and a checkbox is not hard to implement.

        Previously, the plugin didn't detect anything, but this often meant missing files/broken images on the exported site. It's a balance to get the most complete export for a user's first experience, without including way too many files. Still working on it, but the Excludes option, combined with the Initial crawl list preview, currently allows for good control.

        We're also detecting certain plugins that need extra behaviour to ensure the site exports fine/includes/excludes certain things. ie, Elementor, if you use the list icons from FontAwesome, needs to force include those or they're missing from the export due to the way they're referenced.

        A part of me doesn't love "cheating" in the crawl process by getting this internal information from WordPress, but it's a WordPress plugin, with a specific aim, so will use all the tools available to get a fast and accurate export (the continued mission!).