website/content/blog/archivingsites.md

---
title: "Mirroring or Archiving an Entire Website"
date: 2019-08-02T22:42:16-04:00
draft: false
tags: [ "Archive" ]
medium_enabled: true
---

I have several old Wordpress sites lying around that I would like to archive but not maintain anymore. Since I don't intend to create any more content on these sites, we can use tools like `wget` to scrape an existing site and provide a somewhat *read-only* copy of it. I say read-only not because we can't edit it, but because it's not in the original source format of the website.

There have been several tackles to the problem:

- https://stackoverflow.com/questions/538865/how-do-you-archive-an-entire-website-for-offline-viewing#538878
- [https://letswp.io/download-an-entire-website-wget-windows/](https://web.archive.org/web/20190915143432/https://letswp.io/download-an-entire-website-wget-windows/)

And ultimately after consulting these resources I've came to the following command:

```bash
wget --mirror \
     --convert-links \
     --adjust-extension \
     --page-requisites \
     --no-verbose \
     https://url/of/web/site
```

There were other solutions in that stack overflow post, but something about the simplicity of `wget` appealed to me.

[Example site I archived with this.](https://sentenceworthy.com)
Website snapshot 2020-01-15 21:51:49 -05:00			`---`
Minor edits to flags 2022-12-24 11:23:11 -05:00			`title: "Mirroring or Archiving an Entire Website"`
Website snapshot 2020-01-15 21:51:49 -05:00			`date: 2019-08-02T22:42:16-04:00`
			`draft: false`
Re-tagged blog posts 2022-01-02 14:24:29 -05:00			`tags: [ "Archive" ]`
Medium syndication information 2023-01-05 14:04:45 -05:00			`medium_enabled: true`
Website snapshot 2020-01-15 21:51:49 -05:00			`---`

Cleaned up blog post 2020-01-15 22:55:39 -05:00			I have several old Wordpress sites lying around that I would like to archive but not maintain anymore. Since I don't intend to create any more content on these sites, we can use tools like `wget` to scrape an existing site and provide a somewhat read-only copy of it. I say read-only not because we can't edit it, but because it's not in the original source format of the website.
Website snapshot 2020-01-15 21:51:49 -05:00
Cleaned up blog post 2020-01-15 22:55:39 -05:00			`There have been several tackles to the problem:`
Website snapshot 2020-01-15 21:51:49 -05:00
Cleaned up blog post 2020-01-15 22:55:39 -05:00			`- https://stackoverflow.com/questions/538865/how-do-you-archive-an-entire-website-for-offline-viewing#538878`
Fixed more links 2022-12-05 12:53:11 -05:00			`- [https://letswp.io/download-an-entire-website-wget-windows/](https://web.archive.org/web/20190915143432/https://letswp.io/download-an-entire-website-wget-windows/)`
Cleaned up blog post 2020-01-15 22:55:39 -05:00
			`And ultimately after consulting these resources I've came to the following command:`
Website snapshot 2020-01-15 21:51:49 -05:00
Updated archiving sites post 2020-01-15 22:35:58 -05:00			```bash
			`wget --mirror \`
			`--convert-links \`
			`--adjust-extension \`
			`--page-requisites \`
Minor edits to flags 2022-12-24 11:23:11 -05:00			`--no-verbose \`
Updated archiving sites post 2020-01-15 22:35:58 -05:00			`https://url/of/web/site`
			```
Website snapshot 2020-01-15 21:51:49 -05:00
Cleaned up blog post 2020-01-15 22:55:39 -05:00			There were other solutions in that stack overflow post, but something about the simplicity of `wget` appealed to me.
Website snapshot 2020-01-15 21:51:49 -05:00
Added tags 2020-02-16 17:46:18 -05:00			`[Example site I archived with this.](https://sentenceworthy.com)`