Website ripper?

zabadoh@lemmy.ml · 1 year ago

Website ripper?

zabadoh@lemmy.ml · edit-2 1 year ago

Okay, I found SurfOffline that does the trick without too much hassle, but…

It’s verrrrrrrry slooooooooow.

It uses Internet Explorer as a module, and calls each individual resource separately, instead of file copying from IE’s cache, which is weird and slow, especially when hundreds of images are involved.

And SurfOffline doesn’t appear to be supported anymore, i.e. the support email’s inbox is full.

edit: Aaaaand SurfOffline doesn’t save to .html files with a directory structure!!! It stores everything in some kind of sql database, and it only saves to .mht and .chm files, which are deprecated Microsoft help file formats!!!

What it does have is a built in web server that only works while the program is running.

So what I plan to do is have the program up but doing nothing, while I sick Httrack on the 127.0.0.1 web address for my ripped website.

Httrrack will hopefully “extract” the website to .html format.

Whew, what a hassle!

zabadoh@lemmy.ml · 1 year ago

To continue my travails:

Httrack didn’t do a great job: It was slow, even copying from the same machine, and it flattened the directory structure of the website it was writing, making it almost un-navigable.

Here’s where Cyotek WebCopy shines: It’s copying the website from SurfOffline’s database webserver quickly, so I should have the entire website re-extracted very soon!