Webs_url
Sloppy URL processing.
URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.
Warning. None of the functions here perform percent encoding or decoding.
The type for kinds of relative references. Represents this alternation.
The type for kinds of URLs. Represents this this alternation.
kind u
determines the kind of u
. It decides that u
is absolute if u
starts with a scheme and :
.
absolute ~root url
is url
if kind url
is `Abs
. Otherwise uses root
to make it absolute according to its relative_kind
. The result is guaranteed to be absolute if root
is, the result may be surprising or non-sensical if root
isn't (FIXME maybe we should rather call that concat and make it like Fpath.concat
). Warning. This doesn't resolve relative path segments.
authority u
extracts a URL authority (HOST:PORT
) from u
, if any.
val update :
?scheme:scheme option ->
?authority:string option ->
?path:path option ->
?query:query option ->
?fragment:fragment option ->
t ->
t
update u
updates the specified components of u
. If unspecified kept as in u
, if updated with None
the component is deleted from u
.
list_of_text_scrape ?root s
roughly finds absolute and relative URLs in s
by looking in order:
href
or src
substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.http
substrings in s
and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http://
or https://
.Relative URLs are made absolute
with root
if provided. Otherwise they are kept as is. The result may have duplicates.
val pp : Stdlib.Format.formatter -> t -> unit
pp
formats an URL. For now this is just Format.pp_print_string
.
val pp_kind : Stdlib.Format.formatter -> kind -> unit
pp_kind
formats an unspecified representation of kinds.