B0_url
Sloppy URL processing.
URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.
Warning. None of the functions here perform percent encoding or decoding. Use Percent
when deemed appropriate.
The type for kinds of relative references. Represents this alternation.
The type for kinds of URLs. Represents this this alternation.
kind u
determines the kind of u
. It decides that u
is absolute if u
starts with a scheme and :
.
val update :
?scheme:scheme option ->
?authority:string option ->
?path:path option ->
?query:query option ->
?fragment:fragment option ->
t ->
t
update u
updates the specified components of u
. If unspecified kept as in u
, if updated with None
the component is deleted from u
.
append root u
is u
if kind u
is `Abs
. Otherwise uses root
to make it absolute according to its relative_kind
. The result is guaranteed to be absolute if root
is, the result may be surprising or non-sensical if root
isn't (FIXME can't we characterize that more ?).
list_of_text_scrape ?root s
roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s
by looking in order:
href
or src
substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.http
substrings in s
and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http://
or https://
.Relative URLs are appended to root
if provided. Otherwise they are kept as is. The result may have duplicates.
val pp : Stdlib.Format.formatter -> t -> unit
pp
formats an URL. For now this is just Format.pp_print_string
.
val pp_kind : Stdlib.Format.formatter -> kind -> unit
pp_kind
formats an unspecified representation of kinds.