Type inference in Transd


Introduction

Transd is a statically and strongly typed language. During the compilation of a Transd progam, all the program's expressions and objects are assigned with a type at the place of their first occurence in the code, and this type remains unchangeable during the whole life-time of the object.

The two advantages that static typing brings to the language are: 1) gain in performance, 2) gain in program's safety.

The types of program objects either should be explicitly specified by the programmer, or the explicit type annotations in many cases may be omitted, when the type of an object can be automatically deduced. Such automatic deducing of types (also called type inference) serves for two purposes: 1) it makes the program code less cluttered with explicit type annotations; 2) it frees the programer from typing the explicit type annotations.

Type inference

The following Transd code dosn't have a single type annotation and may look like written in a dynamically typed language:

#lang transd

MainModule: {
v: [0,1,2,3,4,4,6,7,8,9],
v1: ["a", "b", "c", "d", "e", "f", "g", "g"],
s: "aabcdefgh",
_start: (λ 
    (with pos (find-adjacent v)
        (if (neq pos (end v)) 
            (lout (get-el pos))))
    (with pos (find-adjacent v1)
        (if (neq pos (end v1)) 
            (lout (get-el pos))))
    (with pos (find-adjacent s)
        (if (neq pos (end s)) 
            (lout (get-el pos))))
)}

This code searches for the first pairs of adjacent identical elements in three sequence containers, and outputs those elements if they were found. The output of the code is:

4
g
a

Despite the absence of type annotations, all the six variables, three find-adjacent functions and three end functions in this program have their own strictly defined types, which are automatically deduced at the compilation stage:

v : Vector<Int>, 
v1 : Vector<String>,
s : String,
pos : Position<Int>,
pos : Position<String>,
pos : Position<Char>

Each of the pos variables is a local variable, whose scope is limited to the with block where it's been declared. Each of the find-adjacent and end functions has the return type coinciding with the type of the pos variable to the left of it.

If we change the _start function and merge the last with block with the previous one as follows:

    _start: (λ 
        (with pos (find-adjacent v)
            (if (neq pos (end v)) 
                (lout (get-el pos))))
        (with pos (find-adjacent v1)
            (if (neq pos (end v1)) 
                (lout (get-el pos))))
            (set pos (find-adjacent s)) // compilation error!
            (if (neq pos (end s)) 
                (lout (get-el pos)))
    )

A compilation error will be generated, since the type of the pos variable, which was deduced and assigned to the variable from the return type of (find-adjacent v1) function, doesn't match the return type of the (find-adjacent s) function.

If needed, the code could be placed into one scope block as follows:

    _start: (λ 
        (with pos1 (find-adjacent v)
              pos2 (find-adjacent v1)
              pos3 (find-adjacent s)

            (if (neq pos1 (end v)) 
                (lout (get-el pos1)))
            (if (neq pos2 (end v1)) 
                (lout (get-el pos2)))
            (if (neq pos3 (end s)) 
                (lout (get-el pos3))))
    )

Type inference in Transd provides for flexibility and polymorphism of operations, while keeping them strictly typed and type verifiable at the stage of compilation.

The next example demonstrates how type inference is used in type-safe data operations on datasets, containing different data types.

Before reviewing the example, a brief description of language features used in the example is in place.

typealias statement in Transd is similar to typedef in C++: it is used for introducing short aliases for lengthy type definitions:

LongType: typealias(Index<String Vector<Tuple<Int Int Vector<String>>>>),
vec1: Vector<LongType>

Tuple is a compound container type, which is an array of fixed size, capable to store elements of different types. The types and the order of tuple members are an integral part of the type definition; they are specified or automatically deduced at the point of a tuple object declaration and that object can only contain the elements of these types at their respective placeѕ within the array. For example:

#lang transd

MainModule: {
t1: Tuple<Int String>( [[1, "abc"]] ),

_start: (λ 
    (lout t1) // => [[1, "abc"]]
    (set-el t1 0 123)
    (lout t1) // => [[123, "abc"]]
    (set-el t1 0 "ABC") // compilation error: wrong type!
)
}

Vectors of tuples can be used for quick handling of table-like datasets, where tuple objects store table records (rows), and the vector of tuples keeps the whole table. Type inference makes it possible to operate on such table in type-safe manner. For example:

#lang transd

MainModule: {
Row : typealias(Tuple<Int String Int String>()),
v: Vector<Row>(),
r1 : [[6, "y", 3, "b"]],
r2 : [[1, "t", 5, "d"]],
r3 : [[3, "h", 7, "a"]],
r4 : [[2, "e", 4, "c"]],

_start: (λ 
    (append v r1)(append v r2)(append v r3)(append v r4)
    (sort v
        (lambda l Row() r Row() -> Bool()
            (ret (less (get l 3) (get r 3)))))
    (lout v)
)
}

OUTPUT:

[[[3, h, 7, a]], 
 [[6, y, 3, b]], 
 [[2, e, 4, c]], 
 [[1, t, 5, d]]]

This example shows how a vector of tuples, containing a table with differently typed fields, can be sorted with a custom comparison predicate. In this case the vector is sorted by the last tuple field in ascending order.

Note, that all operations with tuple fields are type safe and type verifiable at compile time. Example:

#lang transd

MainModule: {
    tp : [[12, "ab", 34, "cd"]],

    concatStr: (λ l String() r String() (ret (+ l r))),

    _start: (λ 
        (lout (concatStr (get tp 1) (get tp 3)))
    )
}

OUTPUT:

abcd

Here, a tp variable is declared, whose type Tuple<Int String Int String> is deduced automatically. Then, tp's fields 1 and 3, which contain strings, are passed to concatStr function, which accepts two arguments of String() type and returns a new string, which is the concatenation of arguments. If a pair of other fields is passed to concatStr, say, fields 0 and 2, or fields 1 and 2, a compilation type error will be generated.

Type inference in Transd allows to omit in source code the annotations of function return types. E.g., in the example abobe, the return type of concatStr function is unambigously deducible at compile time from the types of its arguments and expressions in the function's body.