Net.UrlSloppy URL processing.
URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.
Warning. None of the functions here perform percent encoding or decoding. Use Percent when deemed appropriate.
module Authority : sig ... endSloppy authority processing.
val authority : t -> Authority.t optionauthority u is the authority of u, if any.
val target : t -> string optionThe type for kinds of relative references. Represents this alternation.
The type for kinds of URLs. Represents this this alternation.
kind u determines the kind of u. It decides that u is absolute if u starts with a scheme and :.
val of_url :
t ->
?scheme:scheme option ->
?authority:Authority.t option ->
?path:path option ->
?query:query option ->
?fragment:fragment option ->
unit ->
tof_url u () is a new url whith unspecified components defaulting to those of u. If specified with None the given component is deleted.
append root u is u if kind u is Absolute. Otherwise uses root to make it absolute according to its relative_kind. The result is guaranteed to be absolute if root is, the result may be surprising or non-sensical if root isn't (FIXME can't we characterize that more ?).
to_absolute ~scheme ~root_path transforms u depending on the value of kind u:
Absolute then this is u itself.Relative Scheme then u is given the scheme scheme.Relative Absolute_path then u is given the scheme scheme.Relative Relative_path then u is given the scheme scheme and the path of u is prepended by root_path (if any).Relative Empty then u is given the scheme scheme and the path is root_path (if any).val is_likely_percent_decoded : t -> boolis_likely_percent_decoded u tries to guess if u is percent decoded. It returns
true if there is a byte in u such that Char.Ascii.is_graphic is false.false otherwise. In this case u is likely encoded or needs no encoding.If u is the result of Percent.encode Uri this always returns false but on foreign data the test mail fail e.g. it wrongly returns false on http://example.org/zoom/100%.
Note. This function can likely be improved.
val to_endpoint :
supported_schemes:(scheme * Authority.port) list ->
t ->
([> `Host of Authority.host * Authority.port ], string) Stdlib.resultto_endpoint ~supported_scheme url:
Ok (`Host (host, port)) iff url has a scheme and it can be found in supported_schemes and url has an authority. The host value is the Authority.host and port the Authority.port or the default port specified in supported_schemes if absent.Error _, if url has no scheme or that it can't be found in supported_schems or if url has no authority. The error message is of the form "URL <url>: …"Raises Invalid_argument if supported_schemes is empty.
val pp : Stdlib.Format.formatter -> t -> unitpp formats an URL. For now this is just Format.pp_print_string.
val pp_kind : Stdlib.Format.formatter -> kind -> unitpp_kind formats an unspecified representation of kinds.
list_of_text_scrape ?root s roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s by looking in order:
href or src substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.http substrings in s and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http:// or https://.Relative URLs are appended to root if provided. Otherwise they are kept as is. The result may have duplicates.