Module Zipc

ZIP archives.

Consult the quick start and limitations.

References.

Archive members

type compression =
  1. | Bzip2
  2. | Deflate
    (*

    Via Zipc_deflate.

    *)
  3. | Lzma
  4. | Stored
    (*

    No compression.

    *)
  5. | Xz
  6. | Zstd
  7. | Other of int

The type for compression formats.

Zipc only handles Stored and Deflate but third party libraries can be used to support others formats or to plug an alternate implementation of Deflate.

val pp_compression : Stdlib.Format.formatter -> compression -> unit

pp_compression formats compression formats.

module Fpath : sig ... end

File paths and modes.

module Ptime : sig ... end

POSIX time.

module File : sig ... end

Archive file data.

module Member : sig ... end

Archive members.

Archives

type t

The type for ZIP archives.

val empty : t

empty is an empty archive.

val is_empty : t -> bool

is_empty z is true iff z is empty.

val mem : Fpath.t -> t -> bool

mem p z is true iff z has a member with path p.

val find : Fpath.t -> t -> Member.t option

find p z is the member with path p of z (if any).

val fold : (Member.t -> 'a -> 'a) -> t -> 'a -> 'a

fold f z acc folds f over the members of z starting with acc in increasing lexicographic member path order. In particular this means that directory members, if they exist, are folded over before any of their content (assuming paths without relative segments).

val add : Member.t -> t -> t

add member z is z with member added. Overrides a previous member with the same path in z (if any).

val remove : Fpath.t -> t -> t

remove p is z with member with path p removed (if any).

val member_count : t -> int

member_count z is the number of members in z.

val to_string_map : t -> Member.t Stdlib.Map.Make(Stdlib.String).t

to_string_map z is z as a map from Member.path to their values.

val of_string_map : Member.t Stdlib.Map.Make(Stdlib.String).t -> t

of_string_map map is map as a ZIP archive.

Warning. It is assumed that in map each key k maps to a member m with Member.path m = k. This is not checked by the function.

Decode

val string_has_magic : string -> bool

string_has_magic s is true iff s has at least 4 bytes and starts with PK\x03\04 or PK\x05\06 (empty archive).

val of_binary_string : string -> (t, string) Stdlib.result

of_binary_string s decodes a ZIP archive from s.

Note. ZIP archives's integrity constraints are unclear. For now based on sanity and certain archives found in the wild that are supported by the unzip tool the following is done:

  • As a rule of thumb, all member metadata is determined only from the archive's central directory file header; local file headers and data descriptors are ignored.
  • If a directory member pretends to have file data this data is ignored.
  • If a path is defined more than once, the second definition takes over.
  • If the central directory CRC-32 of a file member is 0 we lookup and use the value found in its local file header.

Encode

val encoding_size : t -> int

encoding_size z is the number of bytes needed to encode z.

val to_binary_string : ?first:Fpath.t -> t -> (string, string) Stdlib.result

to_binary_string z is the encoding of archive z. Error _ is returned with a suitable error message in case z has more members than Member.max.

If a member with path first exists in z then this member's data is written first in the ZIP archive. It defaults to "mimetype" to support the EPUB OCF ZIP container constraint (you are however in charge of making sure this member is not compressed in this case).

Note.

  • Member.mtime that are before the Ptime.dos_epoch are silently truncated to that date.
  • Except for first, member data is encoded in the (deterministic) increasing lexical order of their path.
  • The encoding does not use data descriptors, so bit 3 of File.gp_flags is always set to 0 on encoding.
val write_bytes : ?first:Fpath.t -> t -> ?start:int -> bytes -> (unit, string) Stdlib.result

write_bytes t ~start b writes to_binary_string to bytes starting at start (defaults to 0).

Raises Invalid_argument if b is too small.

Limitations

Up to the limitations listed below Zipc is suitable for the following:

It is not the aim of Zipc to be able to read every ZIP archive out there. The format is quite loose, highly denormalized, has plenty of ways to encode metadata and allows many modern and legacy compression algorithms to be used. Hence take into account the following points: