Tap and drag to input strokes. Use backspace or double-tap to clear the gesture.
The $P multistroke gesture recognizer is a simple prototyping algorithm to recognize gesticulating humans. You can test it on the gray surface above with your mouse or fingers to recognize the symbols on the right. The following is a short explanation and straightforward OCaml implementation of $P.
The idea behind $P is to only consider a gesture's underlying point set. Discarding the information about the user input time line and directions makes it easier to recognize a symbol regardless of how it has been drawn by the user: with a single stroke, with multiple strokes, from left to right, from right to left, etc.
Given a set of gesture templates
T, the task of the recognizer is to
find the best element
T that matches a user input gesture
g. Seeing gestures as point sets, this calls for a notion of
matching between point sets and a matching metric to evaluate its
A matching function between point sets
t is an injective map
m : g -> t that associates each point of
g to a distinct point of
t. Assuming gestures and templates have the same number
points, which can been achieved by resampling, we define a matching
m as a bijective map between
To evaluate the fitness of
m our matching metric sums the euclidian
distance between the points of
t matched by
Σ ‖g(i) - m(g(i))‖ with
The smaller this matching distance, the better the matching. We
define the distance between two gestures
t as being the
smallest matching distance of all the
n! matchings between
Now to match a user input
g against a set of gesture templates
compute for each template in
T the distance to
g and take the one
yielding the smallest value as being the recognized template for
T; a 1-nearest neighbor classification.
This is the high-level view. Making this simple to implement and computationally practical requires considerations and trade-offs that were investigated by $P's designers and whose conclusions are described next.
We start by defining our API. The signature below should be mostly self-explaining.
open Gg module Gest : sig type stroke = p2 list type t (* Gestures *) val of_strokes : stroke list -> t type 'a reco (* Recognizers for symbols of type ['a] *) val empty_reco : 'a reco val add_sym : t -> 'a -> 'a reco -> 'a reco val sym_gests : 'a reco -> 'a -> t list val recognize : 'a reco -> t -> 'a option * float (* score *) end = struct type stroke = p2 list
The $P matching algorithm is easier to write with point set indexing so we
represent gestures as array of points of fixed length
let n = 32 let n_float = float n type t = p2 array (* of length [n] *)
All gestures have the same number
n of points. The resampling
function takes a raw gesture, represented by a list of strokes, and
tries to resample them to a point set with
n mostly equidistant
points spread on the strokes. $P's resampling algorithm doesn't work
on single point strokes (taps); we massage a bit the data to remove
let stroke_len : stroke -> float = fun ps -> let rec loop l prev = function |  -> l | p :: ps -> loop (l +. V2.(norm @@ p - prev)) p ps in if ps =  then 0. else loop 0. (List.hd ps) (List.tl ps) let resample : stroke list -> t = fun ss -> let rec massage len acc = function |  :: ss -> massage len acc ss | [tap] :: ss -> massage len acc ss | s :: ss -> massage (len +. stroke_len s) (s :: acc) ss |  -> len, acc in let len, ss = massage 0.  ss in let ival = len /. (n_float -. 1.) in let first = match ss with (p :: _) :: _ -> p | _ -> P2.o in let g = Array.make n first in let rec loop d last ss i = if i >= n then g else match ss with |  -> g |  :: (p :: ps) :: ss -> (* start *) loop 0. p (ps::ss) i |  :: ss -> (* end *) loop 0. last ss i | (p :: ps) :: ss' -> let walk = V2.(p - last) in let dist = V2.norm walk in let d' = d +. dist in match d' < ival with | true -> loop d' p (ps :: ss') i | false -> let t = (ival -. d) /. dist in let q = V2.(last + t * walk) in g.(i) <- q; loop 0. q ss (i + 1) in loop 0. first ss 1
$P's recognition is insensitive to scale and position. For this we first map a gesture to the unit square [0;1]×[0;1] with a, uniform, shape preserving scaling transform that maps the largest axis-aligned extent of the point set to the unit. Then we follow by translating the centroid of the point set to the origin. We perform these normalizations in place.
let to_unit_square : t -> unit = fun g -> let pmin, max_extent = let rec loop xmin xmax ymin ymax g i = match i < n with | false -> (P2.v xmin ymin), max (xmax -. xmin) (ymax -. ymin) | true -> let px, py = P2.x g.(i), P2.y g.(i) in loop (min xmin px) (max xmax px) (min ymin py) (max ymax py) g (i + 1) in loop infinity neg_infinity infinity neg_infinity g 0 in if max_extent = 0. then () (* avoid degen div by zero *) else let rescale = 1. /. max_extent in let rescale p = V2.(rescale * (p - pmin)) in Array.iteri (fun i p -> g.(i) <- rescale p) g let centroid : t -> P2.t = fun g -> let rec loop cx cy g i = match i < n with | false -> V2.v (cx /. n_float) (cy /. n_float) | true -> let p = g.(i) in loop (cx +. V2.x p) (cy +. V2.y p) g (i + 1) in loop 0. 0. g 0 let center : t -> unit = fun g -> let c = centroid g in Array.iteri (fun i p -> g.(i) <- V2.(p - c)) g
We are now ready to convert our raw gestures to gesture values.
let of_strokes : stroke list -> t = fun ss -> let g = resample ss in (to_unit_square g; center g; g)
Gesture values being properly normalized we proceed to compute the
distance between two of them. As mentioned above this means finding
the matching with the smallest matching distance among the
possible matching functions.
This is an instance of the combinatorial optimization assignment problem for which the Hungarian Method provides an O(n3) algorithm. From the point of view of $P's designers this algorithm is too expensive both from a time and implementation complexity perspective. As such $P implements a simpler O(n2.5) heuristic algorithm that doesn't yield the optimal matching but provides good empirical results.
The algorithm matches points between
t as follows. Starting
with the first point of
g it finds the closest point in
t and takes
it as the matching point. It then proceeds with the next point of
and selects the closest point in
t that is not yet matched and so on
until we get to the final point of
g for which there's no choice but
a single remaining point of
t to match with.
We do not generate the list of actual matching pairs as we are only
interested in the matching distance. Instead we directly compute the
distance between matched points and sum it on the fly. During this
process the algorithm weights the distance between matched points by a
value in [0;1]. The weight expresses a degree of confidence in the
point match. The first point of
g has a match confidence
w(0) of 1
since it is compared to all the points of the template. Subsequent
points do not see all the points of the template so the match
w(i) decreases according to the following formula:
w(i) = 1 - (i / n) with
The final result of this computation is sensitive to the starting
g and the order in which points are considered. To
counteract this, the procedure is run with nϵ (with ϵ in
[0;1], 0.5 in practice) different starting points of
g and the
smallest resulting value is kept — the order is however not altered.
This computation is also asymmetric, so we measure
g and keep again the smallest value.
let gest_dist_at : start:int -> t -> t -> float = fun ~start g t -> (* distance by starting at [start] in [g]. *) let matched = Array.make n false in let match_dmin p t = (* unmatched closest to [p] in [t]. *) let rec loop dmin_idx dmin k = match k < n with | false -> dmin_idx, dmin | true -> let next = k + 1 in if matched.(k) then loop dmin_idx dmin next else let d = V2.(norm @@ p - t.(k)) in match d < dmin with | true -> loop k d next | false -> loop dmin_idx dmin next in let dmin_idx, dmin = loop (-1) infinity 0 in (matched.(dmin_idx) <- true; dmin) in let rec loop sum i = let dmin = match_dmin g.(i) t in let w = 1. -. (float @@ (i - start + n) mod n) /. n_float in let sum = sum +. w *. dmin in let next = (i + 1) mod n in if next = start then sum else loop sum next in loop 0. start let gest_dist : t -> t -> float = let eps = 0.5 in let step = truncate (n_float ** (1. -. eps)) in fun g t -> let rec loop dmin i = match i < n with | false -> dmin | true -> let d0 = gest_dist_at ~start:i g t in let d1 = gest_dist_at ~start:i t g in loop (min d0 d1) (i + step) in loop infinity 0
We fill in the trivial bits of the recognizer type:
type 'a reco = (t * 'a) list let empty_reco =  let add_sym t sym r = (t, sym) :: r let sym_gests r sym = List.(map fst @@ find_all (fun (_, sym') -> sym = sym') r)
The function to recognize the symbol associated to an input gesture simply selects the symbol of the template with the smallest distance. It also returns a score in the range [0;1] with 1 denoting a perfect match.
let recognize : 'a reco -> t -> 'a option * float = fun r g -> let score dmin = max (0.5 *. (2. -. dmin)) 0. in let rec loop fnd dmin = function |  -> fnd, score dmin | (t, sym) :: ts -> let d = gest_dist g t in match d < dmin with | true -> loop (Some sym) d ts | false -> loop fnd dmin ts in loop None infinity r end
These are the only software gesticulations we needed. The $P gesture recognizer is described and analyzed in this paper; empirical usage guidelines such as number of symbols and adequate number of templates per symbol are also provided. Related information and works can be found on the algorithm's homepage.
Full source (gist) with self-contained webpage of the above interface.