Monday, March 24, 2014

Comparing two MD5 files

Using md5sum to compare single files is easy to do and straightforward. Working with multiple files recursively in directories isn’t straightforward. Creating an md5 file of a directory path can be accomplished by:

find ./path -exec md5sum -b '{}' \; > output.md5

But how do you compare the contents of two different output files? This short Perl script will compare two md5sum outputs and report the old, new, and changed files in a new file called diff.txt. Keep in mind that if the path of identical files are different, they will be interpreted as old/new.

#!/usr/bin/perl -w
 
use strict;
 
# Read MD5 file and return results into a hash 
sub readmd5
{
  my ( %hash, $md5, $filename );
  local ( *FH );
 
  open FH, "$_[0]" or die "Unable to read file $_[0]\n";
 
  while ( <FH> ) # Read lines from file
  {
    chomp; # unneeded since regex below will not slurp newline, but including anyway
    ($md5, $filename) = m%
                         ^               # begining of line
                         ([0-9a-fA-F]+)  # hex value
                         \s[\s*]         # break between MD5 and file
                         (?:\./)?        # ignore "./"
                         (.+)            # rest of line is filename and path
                         $               # end of line
                         %x;
    if ( defined $md5 && defined $filename )  # Ensure regex worked
    {
      $hash{$filename} = $md5; # create hash with filename as the key and md5 as value
    }
    else
    {
      print STDERR "Warning: entry in ", $_[0], " not recognized:\n", $_, "\n";
    }
  }
 
  return \%hash;
}
 
# Compare two hashes of file/MD5 entries.
sub comparemd5
{
  my ( %old, %new, %diff, $filename );
 
  # Create new local hashes using data from referenced "old" and "new" hashes
  %old = ( %{$_[0]} );
  %new = ( %{$_[1]} );
 
  foreach $filename (keys %old)
  {
    if ( exists $new{$filename} ) # Detect changed and deleted entires
    {
      $diff{$filename} = "CHANGED" if ( $old{$filename} cmp $new{$filename} );
      delete $new{$filename}; # Eliminate all common keys from "new" hash
    }
    else
    {
      $diff{$filename} = "DELETED";
    }
  }
 
  foreach $filename (keys %new) # The remaining entries in the "new" hash new files
  {
    $new{$filename} = "NEW";
  }
 
  %diff = ( %diff, %new ); # Combine deleted and changed with new file entries.
 
  return \%diff;
}
 
# Save diff hash to a file called diff.txt
sub savediff
{
  my ( $diff, $filename );
  local ( *FH );
 
  # Copy hash reference
  $diff = $_[0];
 
  open FH, ">diff.txt" or die "Unable to write file diff.txt\n";
 
  # Dictionary sort
  foreach $filename ( sort { lc($a) cmp lc($b) } keys %$diff )
  {
    print FH $filename, ": ", $diff->{$filename}, "\n";
  }
}
 
 
# main
 
my ( $filename, $old, $new, $diff );
my ( $usage );
 
# Check for two command line parameters
$usage = "Usage: $0 old.md5 new.md5\n";
@ARGV == 2 or die $usage;
 
# Read old and new MD5 files and store in hashes
# Returns references to the hashes
$old = readmd5 ( $ARGV[0] );
$new = readmd5 ( $ARGV[1] );
 
# Perform diff of two md5 hashes
$diff = comparemd5 ( $old, $new );
 
# Save results in a file called diff.txt
savediff ( $diff );
# end main

No comments:

Post a Comment