Download an Entire Website

Posted on June 9, 2022 by wilfried

Using the command-line tool wget, this can be achieved with

wget -r -p https://www.website.com

Starting a local Webserver for Testing

Posted on December 10, 2020 by wilfried

Easiest way is to open a console window in what you would like to have as root directory of your webserver and start

python -m http.server

Then, by serving to http://127.0.0.1:8000 you will be able to view and browse the contents.

To specify a port, just add the port number to the command, for example

python -m http.server 9999

Please not that this webserver does not support https, so you have to enter the URL with a starting http:// and you might have to click away a browser warning because of the unsecured site.

Therefore, only use this for local testing!

View unreachable web pages

Posted on February 10, 2020 by wilfried

There can be several reasons why a webpage is not available. Apart from the obvious server outage, there could be a geo block or a recent deletion of the page.

As long as you are able to get the address of the webpage there is a good chance to retrieve it.

One possibility is to use https://cachedview.com/ which allows to view cached pages from Google web. It is usually updated every few days by google so as long as an article is a couple of days old, you have a good chance accessing it there.

Another option is https://web.archive.org/ which archives copies of webpages including old versions. So if you want to know what was on a certain webpage years ago, archive.org is a good start.

Recover webpages from your browser cache

Posted on January 19, 2016 by wilfried

Think of the following situation: The database of your Wiki just crashed, you have a backup from a couple of days ago, but what about the pages that have been edited since then?

There is hope, most likely your browser has cached some of the files when you viewed them the last time.

Very important!
Don’t try to access the page because this will overwrite the cached contents.

To recover the cached pages follow these steps:

Firefox:

Go into offline mode by clicking the hamburger icon, developer tools and work offline
Open the page about:cache
Click on “List Cache Entries”
Search for the page you want to recover
Click on the link, this leads you to a page with the hexdump of the compressed file entitled “Cache entry information”
Click on the link to the page here, the page opens with the cached contents, including images
Save your page

Repeat this from step 2 for every page you are interested in. Go back to online mode (hamburger icon, developer tools and uncheck work offline) when done.

Chrome:

Open chrome://cache
Search for the page you want to recover
Click on the link, this leads you to a page with the hexdump of the compressed file
Select all on this page (Ctrl + A) and copy it to the clipboard (Ctrl + C)
Go to this page http://www.sensefulsolutions.com/2012/01/viewing-chrome-cache-easy-way.html
Paste the contents into the textbox on the page
Click “Go”
Click on the link under “Results” to download the cached file. Some browsers block the download of html files because of security concerns, in this case go to downloads and explicitely allow the download of this file

Repeat this for every page you are interested in.

You are welcome.

Avoid distracting advertisements

Posted on October 19, 2015 by wilfried

Avoid distracting advertisements on your browser and Skype with the following three tips (they can be applied independently):

Install an Ad-blocker for your web browser. Adblock Plus works very well. Some pages block their content if you have an adblocker installed. Therefore, Adblock Plus can be quickly deactivated/activated with a middle mouseclick on the symbol.
Deactivate Criteo: Criteo is using tracking cookies to present personalized advertisement. This page explains how to block Criteo cookies
Block advertisement in Skype by adding Skype to the restricted internet sites: WinKey+R > inetcpl.cpl > Security > Restricted Sites > Sites and then add https://apps.skype.com to the list

You are welcome!

Download an entire Web site for off-line viewing with wget

Posted on January 8, 2013 by wilfried

$ wget \
–recursive \
–no-clobber \
–page-requisites \
–html-extension \
–convert-links \
–restrict-file-names=windows \
–domains website.org \
–no-parent \
www.website.org/tutorials/html/

This command downloads the Web site www.website.org/tutorials/html/.

The options are:
–recursive: download the entire Web site.
–domains website.org: don’t follow links outside website.org.
–no-parent: don’t follow links outside the directory tutorials/html/.
–page-requisites: get all the elements that compose the page (images, CSS and so on).
–html-extension: save files with the .html extension.
–convert-links: convert links so that they work locally, off-line.
–restrict-file-names=windows: modify filenames so that they will work in Windows as well.
–no-clobber: don’t overwrite any existing files (used in case the download is interrupted and resumed).

Codepad

Posted on December 16, 2012 by wilfried

codepad is an online collaboration tool for exchanging code which also integrates various compilers/interpreters. After entering code, codepad will run it and print the output.
New pads can be created without registration. For each pad you get a short URL for sharing the code.

Codepad provides syntax highlighting depending on the language. Supported languages are C, C++, D, Haskell, Lua, OCaml, PHP, Perl, Python, Ruby, Scheme, and Tcl. It can be also used to share plain text.

Getting royalty-free images

Posted on May 29, 2012 by wilfried

Are you in need for free photos to be used in a presentation or publication? The internet is full of pictures of all kinds, but most of them cannot be used without paying a fee or infringing some copyright.

Noteable exceptions are wikimedia.org, the multimedia database of Wikipedia and freedigitalphotos.net, which offers royalty-free low-res images (400 px width).

You can limit your google picture search (which comes with a nice customable preview) to one of these sites by adding a “site:wikimedia.org” to the search field.

Free image from FreeDigitalPhotos.net

NES Technik

Just another WordPress site

Tag Archives: web