B0_url
Sloppy URL processing.
URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.
Warning. None of the functions here perform percent encoding or decoding. Use Percent
when deemed appropriate.
The type for kinds of relative references. Represents this alternation.
The type for kinds of URLs. Represents this this alternation.
kind u
determines the kind of u
. It decides that u
is absolute if u
starts with a scheme and :
.
val of_url :
t ->
?scheme:scheme option ->
?authority:authority option ->
?path:path option ->
?query:query option ->
?fragment:fragment option ->
unit ->
t
of_url u ()
is a new url whith unspecified components defaulting to those of u
. If specified with None
the given component is deleted.
append root u
is u
if kind u
is `Abs
. Otherwise uses root
to make it absolute according to its relative_kind
. The result is guaranteed to be absolute if root
is, the result may be surprising or non-sensical if root
isn't (FIXME can't we characterize that more ?).
module Authority : sig ... end
Sloppy authority processing.
list_of_text_scrape ?root s
roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s
by looking in order:
href
or src
substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.http
substrings in s
and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http://
or https://
.Relative URLs are appended to root
if provided. Otherwise they are kept as is. The result may have duplicates.
val pp : Stdlib.Format.formatter -> t -> unit
pp
formats an URL. For now this is just Format.pp_print_string
.
val pp_kind : Stdlib.Format.formatter -> kind -> unit
pp_kind
formats an unspecified representation of kinds.