Site Downloader?
I recently had a client who had lost contact with the person who previously maintained their site. Unfortunately, they had no passwords or anything, and needed to switch hosting to be able to update the site.
The site was fairly small, so I just went through and downloaded each page through my browser. The images and such were placed in directories like "index_files", so I had to move all of those into folders like "images" and change the hrefs and such.
Anyway, before doing that, I went online and saw that there were a bunch of programs to download entire sites. But the first couple I tried didn't quite fit the bill. This was a small site, so I did it by hand to save time, but if I ever have to do this with a larger site, I'll be way better off using one of those programs.
Does anyone know of a good site downloader? I'd prefer that it have the capabilities of downloading either all or just specified file types, saving relative locations, and limiting downloads to one or more specified domains. And free would be nice, though I'd probably be willing to pay as much as $50 or so if I had to.
karmaman posted this at 15:48 — 30th March 2006.
He has: 82 posts
Joined: Nov 2003
Hi Tim I think httrack will do what you want it can be found at httrack.com hope this helps.
Busy posted this at 21:17 — 30th March 2006.
He has: 6,151 posts
Joined: May 2001
Is Go!Zilla still around
Most of them are free, here is a list of them that I block, they only the UA but could help you find one
almaden
Anarchie
ASPSeek
attach
BackWeb
Bandit
BatchFTP
Bot\ mailto:[email protected]
BlackWidow
Buddy
CherryPicker
ChinaClaw
Collector
Copier
CICC
Crescent
Custo
DA
DISCo\ Pump
Download\ Demon
Download\ Wonder
Downloader
Drip
DSurf15a
dts\ agent$
EasyDL
eCatch
EirGrabber
EmailSiphon
EmailWolf
EyeNetIE
Express\ WebPictures
ExtractorPro
EyeNetIE
FileHound
FlashGet
frontpage
^GetRight
GetSmart
GetWeb!
gigabaz
Go!Zilla
Go-Ahead-Got-It
gotit
Grabber
GrabNet
Grafula
grub-client
HMView
HTTrack
httpdown
ia_archiver
Image\ Stripper
Image\ Sucker
Indy\ Library
InterGET
Internet\ Ninja
Iria
Irvine
Java
JetCar
JOC
JustView
^larbin
LeechFTP
LexiBot
lftp
linkwalker
likse
marcopolo
Magnet
Mag-Net
Mass\ Downloader
Memo
MIDown\ tool
Mirror
Mister\ PiX
MJ12bot
Moozilla
MS\ FrontPage
MSIECrawler
MSProxy
NaverRobot
^NaverBot
Navroad
NearSite
NetAnts
NetSpider
Net\ Vampire
Netwu
NetZip
Ninja
NICErsPRO
NPBot
obot
Octopus
Offline\ Explorer
Offline\ Navigator
PageGrabber
Papa\ Foto
pavuk
pcBrowser
Pockey
Program\ Shareware
^psbot
Pump
QuepasaCreep
RealDownload
Reaper
Recorder
ReGet
Siphon
SiteSnagger
SlySearch
SmartDownload
Snake
SpaceBison
^Steeler
Stripper
Sucker
SuperBot
SuperHTTP
Surfbot
SlySearch
tAkeOut
Teleport\ Pro
^TurnitinBot
Tutorial\ Crawler
Vacuum
VoidEYE
WebCapture
WebCopier
Webster
^WebSauger
Web(\ Image|\ Sucker|Auto|bandit|Fetch|site|ZIP|.*er)
Wget
Whacker
Widow
WISEnutbot
Xaldon
Zao
zyborg
is cut down from htaccess so ignore the slashes, ^ and $ etc
Lord Maverick posted this at 05:29 — 19th May 2006.
They have: 34 posts
Joined: May 2006
Busy if anybody decide to download your site - htaccess will not help you.
Busy posted this at 11:23 — 19th May 2006.
He has: 6,151 posts
Joined: May 2001
It's like trying to stop a car theif, you can have lock nuts on your wheels, an alarm, even a pit bull but if they want it they are going to take it, can just slow them down some.
The .htaccess does stop a lot of kiddy nappers that can only download programs to do it.
Greg K posted this at 14:50 — 19th May 2006.
He has: 2,145 posts
Joined: Nov 2003
I Use GetRight myself, and even though it is in Busy's list, they have an option under advanced to change the User Agent. (also has options to automicatlly figure out the referer being sent as well for following links).
-Greg
02bunced posted this at 15:22 — 19th May 2006.
He has: 412 posts
Joined: May 2005
Wget is nice and powerful (as you have a mac, you can install the command line version). Then to use it, just switch your command line to the directory you want to download the files from and use the following format:
wget http://www.domain.com/ -r
Serfaksan posted this at 03:47 — 19th June 2006.
They have: 18 posts
Joined: Jun 2006
wow, I never though about download a site, but, now that you mention this and looking to all the options I think that I'm going to try it on some pages XD
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.