Is there a safe way to run a diff on two zip compressed files?

0 votes
asked Feb 25, 2009 by applepieisgood

Seems this would not be a deterministic thing, or is there a way to do this reliably?

12 Answers

0 votes
answered Jan 25, 2009 by devin-jeanpierre

Reliable: unzip both, diff.

I have no idea if that answer's good enough for your use, but it works.

0 votes
answered Jan 25, 2009 by lieven-keersmaekers

Beyond compare has no problem with this.

0 votes
answered Jan 25, 2009 by chaos

Well, I imagine zdiff would be some use to you.

0 votes
answered Jan 25, 2009 by ruudkok

WinMerge (windows only) has lots of features and one of them is:

  • Archive file support using 7-Zip
0 votes
answered Jan 26, 2009 by cheeso

In general, you cannot avoid decompressing and then comparing. Different compressors will result in different DEFLATEd byte streams, which when INFLATEd result in the same original text. You cannot simply compare the DEFLATEd data, one to another. That will FAIL in some cases.

But in a ZIP scenario, there is a CRC32 calculated and stored for each entry. So if you want to check files, you can simply compare the stored CRC32 associated to each DEFLATEd stream, with the caveats on the uniqueness properties of the CRC32 hash. It may fit your needs to compare the FileName and the CRC.

You would need a ZIP library that reads zip files and exposes those things as properties on the "ZipEntry" object. DotNetZip will do that for .NET apps.

0 votes
answered Feb 25, 2009 by eduffy

If you're using gzip, you can do something like this:

# diff <(zcat file1.gz) <(zcat file2.gz)
0 votes
answered Jan 13, 2010 by mrabbitt

This isn't particularly elegant, but you can use the FileMerge application that comes with Mac OS X developer tools to compare the contents of zip files using a custom filter.

Create a script ~/bin/zip_filemerge_filter.bash with contents:

#!/bin/bash
##
#  List the size, CR-32 checksum, and file path of each file in a zip archive,
#  sorted in order by file path.
##
unzip -v -l "${1}" | cut -c 1-9,59-,49-57 | sort -k3
exit $?

Make the script executable (chmod +x ~/bin/zip_filemerge_filter.bash).

Open FileMerge, open the Preferences, and go to the "Filters" tab. Add an item to the list with: Extension:"zip", Filter:"~/bin/zip_filemerge_filter.bash $(FILE)", Display: Filtered, Apply*: No. (I've also added the filer for .jar and .war files.)

Then use FileMerge (or the command line "opendiff" wrapper) to compare two .zip files.

This won't let you diff the contents of files within the zip archives, but will let you quickly see which files appear within one only archive and which files exist in both but have different content (i.e. different size and/or checksum).

0 votes
answered Jan 19, 2013 by user48678

Actually gzip and bzip2 both come with dedicated tools for doing that.

With gzip:

$ zdiff file1.gz file2.gz

With bzip2:

$ bzdiff file1.bz2 file2.bz2

But keep in mind that for very large files, you might run into memory issues (I originally came here to find out about how to solve them, so I don't have the answer yet).

0 votes
answered Jan 23, 2015 by taavi-ilves

I found relief with this simple Perl script: diffzips.pl

It recursively diffs every zip file inside the original zip, which is especially useful for different Java package formats: jar, war, and ear.

zipcmp uses more simple approach and it doesn't recurse into archived zips.

0 votes
answered Sep 15, 2017 by serv-inc

A python solution for zip files:

import difflib
import zipfile

def diff(filename1, filename2):
    differs = False

    z1 = zipfile.ZipFile(open(filename1))
    z2 = zipfile.ZipFile(open(filename2))
    if len(z1.infolist()) != len(z2.infolist()):
        print "number of archive elements differ: {} in {} vs {} in {}".format(
            len(z1.infolist()), z1.filename, len(z2.infolist()), z2.filename)
        return 1
    for zipentry in z1.infolist():
        if zipentry.filename not in z2.namelist():
            print "no file named {} found in {}".format(zipentry.filename,
                                                        z2.filename)
            diff = difflib.ndiff(z1.open(zipentry.filename),
                                 z2.open(zipentry.filename))
            delta = ''.join(x[2:] for x in diff
                            if x.startswith('- ') or x.startswith('+ '))
            if delta:
                differs = True
                print "content for {} differs:\n{}".format(
                    zipentry.filename, delta)
    if not differs:
        print "all files are the same"
        return 0
    return 1

Use as

diff(filename1, filename2)

It compares files line-by-line in memory and shows changes.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...