What is the best way to determine the mime type of the http file upload?

Suppose you have an html form with an input tag of type "file". When a file is uploaded to the server, it will be stored locally along with the associated metadata.

I can think of three ways to define the mime type:

  • Use the mime type specified in the "multipart / form-data" payload.
  • Use the filename provided in the multipart / form-data payload and find the mime type based on the file extension.
  • scan the original file data and use the mime type guessing library.

None of these solutions are perfect.

What's the most accurate solution?
Is there another, better option?

0


a source to share


2 answers


If you are using PHP you can use

http://pecl.php.net/package/Fileinfo

Which will check many aspects of the file. For Python, you can use

http://pypi.python.org/pypi/python-magic/0.1



What are the bindings for libmagic on Linux / Unix and possibly Windows? systems. Cm:

man magic
man libmagic

      

On Linux. It uses magic numeric tests to try and validate mime files.

I like the magic number method because it can break wrong extensions and a lot of trickery if you are processing files on a downloadable web server. These tests are usually one-off, so the performance of reading through a file is negligible.

+1


a source


I don't think you can rely on any of them as a specific "I am mime type x". The problem with the first two is that the supplied content type might be wrong due to issues with the client (browser or otherwise) or a misleading request (various hacking attempts, etc.) from different clients.

Therefore, you should probably try to combine information from each type and develop some level of confidence. If the file extension says .doc and the mime type is app / msword then there is a pretty good chance it is a word document, but run it through a mime type detection utility to be sure.

There should be a solution available to detect the magic of magic using the language you are using, but you did not mention it. They all usually work by looking at the first few bytes / characters of the file and matching them against a mime type lookup table. Some also remove the spec from the file to help with this. They often fall back to plain text if the mime type cannot be found.



If you want to take a platform independent approach, take a look at the various existing Java libraries:

+1


a source







All Articles