Module Net.Url

Sloppy URL processing.

URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.

Warning. None of the functions here perform percent encoding or decoding. Use Percent when deemed appropriate.

URLs

module Authority : sig ... end

Sloppy authority processing.

type scheme = string

The type for schemes, without the ':' separator.

type path = string

The type for paths.

type query = string

The type for queries, without the '?' separator.

type fragment = string

The type for fragments, without the '#' seperator.

type t = string

The type for URLs.

val scheme : t -> scheme option

scheme u is the scheme of u, if any.

val authority : t -> Authority.t option

authority u is the authority of u, if any.

val path : t -> path option

path u is the path of u, if any.

val query : t -> query option

query u is the query of u, if any.

val fragment : t -> fragment option

fragment u is the fragment of u, if any.

Derived components

val target : t -> string option

target u is the contenation of path, query and fragment, that is everything that comes after the scheme and authority in an URL.

Kinds

type relative_kind =
  1. | Scheme
  2. | Absolute_path
  3. | Relative_path
  4. | Empty

The type for kinds of relative references. Represents this alternation.

type kind =
  1. | Absolute
  2. | Relative of relative_kind

The type for kinds of URLs. Represents this this alternation.

val kind : t -> kind

kind u determines the kind of u. It decides that u is absolute if u starts with a scheme and :.

Operations

val of_url : t -> ?scheme:scheme option -> ?authority:Authority.t option -> ?path:path option -> ?query:query option -> ?fragment:fragment option -> unit -> t

of_url u () is a new url whith unspecified components defaulting to those of u. If specified with None the given component is deleted.

val append : t -> t -> t

append root u is u if kind u is Absolute. Otherwise uses root to make it absolute according to its relative_kind. The result is guaranteed to be absolute if root is, the result may be surprising or non-sensical if root isn't (FIXME can't we characterize that more ?).

val to_absolute : scheme:scheme -> root_path:path option -> t -> t

to_absolute ~scheme ~root_path transforms u depending on the value of kind u:

  • If Absolute then this is u itself.
  • If Relative Scheme then u is given the scheme scheme.
  • If Relative Absolute_path then u is given the scheme scheme.
  • If Relative Relative_path then u is given the scheme scheme and the path of u is prepended by root_path (if any).
  • If Relative Empty then u is given the scheme scheme and the path is root_path (if any).

Percent encoding

val is_likely_percent_decoded : t -> bool

is_likely_percent_decoded u tries to guess if u is percent decoded. It returns

  • true if there is a byte in u such that Char.Ascii.is_graphic is false.
  • false otherwise. In this case u is likely encoded or needs no encoding.

If u is the result of Percent.encode Uri this always returns false but on foreign data the test mail fail e.g. it wrongly returns false on http://example.org/zoom/100%.

Note. This function can likely be improved.

module Percent : sig ... end

Percent-encoding codecs according to RFC 3986.

Predicates and comparisons

val equal : t -> t -> bool

equal tests URLs for binary equality.

val compare : t -> t -> int

compare is a total order on URLs compatible with equal.

Converting

val to_endpoint : supported_schemes:(scheme * Authority.port) list -> t -> ([> `Host of Authority.host * Authority.port ], string) Stdlib.result

to_endpoint ~supported_scheme url:

  • Ok (`Host (host, port)) iff url has a scheme and it can be found in supported_schemes and url has an authority. The host value is the Authority.host and port the Authority.port or the default port specified in supported_schemes if absent.
  • Error _, if url has no scheme or that it can't be found in supported_schems or if url has no authority. The error message is of the form "URL <url>: …"

Raises Invalid_argument if supported_schemes is empty.

Formatting

val pp : Stdlib.Format.formatter -> t -> unit

pp formats an URL. For now this is just Format.pp_print_string.

val pp_kind : Stdlib.Format.formatter -> kind -> unit

pp_kind formats an unspecified representation of kinds.

Scraping

val list_of_text_scrape : ?root:t -> string -> t list

list_of_text_scrape ?root s roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s by looking in order:

  1. For the next href or src substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.
  2. For the next http substrings in s and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http:// or https://.

Relative URLs are appended to root if provided. Otherwise they are kept as is. The result may have duplicates.