Can I get the MD5sum of a directory from Perl?

Question

Can I get the MD5sum of a directory from Perl?

I am writing a Perl script (on Windows) that uses File :: Find to index a network filesystem. It works great, but it takes a very long time to scan the file system. I thought it would be nice to somehow get the checksum of the directory before traversing it and the checksum matches the checksum that was done in the previous run, don't traverse the directory. This would eliminate most of the processing since the files on this filesystem do not change frequently.

In my AIX box, I use this command:

csum -h MD5 /directory

which returns something like this:

5cfe4faf4ad739219b6140054005d506  /directory

The command takes very little time:

time csum -h MD5 /directory
5cfe4faf4ad739219b6140054005d506  /directory

real    0m0.00s
user    0m0.00s
sys     0m0.00s

I have searched CPAN for a module that will do this, but it looks like all modules will give me MD5sum for every file in a directory, not the directory itself.

Is there a way to get the MD5sum for a directory in Perl, or even on Windows, since I could call a Win32 command from Perl?

Thanks in advance!

0

perl md5 checksum

BrianH May 26 '09 at 15:47

a source to share

5 answers

To get the checksum you have to read the files, which means you will have to walk the filesystem, which takes you back to the same boat you are trying to exit from.

+2

Chas. Owens 26 May '09 at 16:00

a source to share

From what I know, you cannot get the md5 of the directory. md5sum on other systems complains when you provide a directory. csum will most likely give you a hash of the contents of the directory file in the top-level directory, not traversing the tree.

You can grab the modified times for the files and hash them however you like by doing something like this:

sub dirModified($){
    my $dir = @_[0];
    opendir(DIR, "$dir");
    my @dircontents = readdir(DIR);
    closedir(DIR);

    foreach my $item (@dircontents){
        if( -f $item ){
            print -M $item . " : $item - do stuff here\n";
        } elsif( -d $item && $item !~ /^\.+$/ ){
            dirModified("$dir/$item");
        }
    }
}

Yes, it will take some time to start.

+1

moshen May 26 '09 at 16:16

a source to share

In addition to other good answers, let me add the following: if you want a checksum, please use the checksum algorithm instead of a ( broken! ) Hash function .

I don't think you don't need a cryptographically secure hash function in your file indexer. Instead, you need a way to see if there are changes in the directory lists without saving the entire list. Checksum algorithms do this: they return a different output when the input changes. They can do it faster as they are simpler than hash functions.

It is true that the user can change the directory in a way that would not have been detected by the checksum. However, the user will have to change the filenames as it is on purpose, since normal changes in filenames will (most likely) produce different checksums. Should this "attack" be defended then?

You should always consider the consequences of each attack and choose the appropriate tools.

+1

Martin Geisler 03 June At 22:26

a source to share

I made one of these in python if you're interested:

http://akiscode.com/articles/sha-1directoryhash.shtml

0

user199486 30 oct. '09 at 7:36

a source to share

SpliFF · Accepted Answer · 2009-05-26T16:02:35+0000

Can you just read the last modified dates of files and folders? Is it going to be faster than creating MD5?

Can I get the MD5sum of a directory from Perl?

More articles: