Track old uploaded files on the web server. How do you know what is no longer used?
Users can upload files to the server that are stored effectively forever.
I want to know if anyone has an idea to track down orphan files. Some of my ideas involve logging every download, but then the files usually link to html, which is not easy to track.
The files may sit unused, but still link to them. I could do a fake text search, but that's pretty brute force.
Do I just give up and let them age?
a source to share
I don't know your situation, but what I have done in the past is translate all old files (images) into a folder located in the same images folder, and used Xenu to check links on all my HTML pages. At the end of the link check, Xenu returned a list of 404s. Then I wrote a script using a 404s list to get the files from the backup back to the images folder.
It worked great ... Still monitoring the log files for a couple of weeks, although I missed something just in case.
Xenu, BTW, a free app to help you find broken links by giving it a start page. It then finds links on that page to crawl your entire site. This will require additional start pages, unless the pages linking to these files are not found otherwise during the crawl.