WebClient.DownloadFile 404 errors with HTML characters in URI?
I am using the WebClient class to download files from a website and have a couple of questions.
-
When the URIs have HTML characters in the URI path (eg http://foo.com/path1
&
path2.pdf) I get 404 (not found) errors. How can I prevent this? I thought HTML characters were safe? -
When the URIs represent a directory (e.g. http://foo.com/path ) I get 403 (forbidden) errors. I understand why this is happening, but how can I check my URI to see if it represents a directory without an index page.
0
a source to share
1 answer
- HTML encoded characters are not URL safe. You need to encode the url. If your data is html encoded, you will want to use HttpUtility.HtmlDecode to get a properly formatted url (i.e.
foo.com/page?foo=1&bar=2
if you have special characters that need to appear in urls like ampersands that are not part of the request Urls , you will need to encode urls.Use HttpUtility.UrlEncode - You can not.
+2
a source to share