Transformation and Analysis of Structured Data

Language for organizing data

compact fast secure

Transd (pronounced "trans-dee") is a statically typed, general purpose programming language with extended functionality for data processing. It uses a novel execution model - virtual compilation - which facilitates reductions in implementation size and gains in performance.

Albert Berger
author of Transd.

#lang transd

class Speaker : {
    say: Lambda<String String>(),
    @init: (λ f Lambda<String String>() (rebind say f )),
    call: (λ s String() (ret (exec say s)))

MainModule: {
     greeter: Lambda<String String>(λ s String() (ret (+ "Hello, " s "!"))),

    _start: (λ 
        (with sp Speaker( greeter )
            (textout (call sp "World"))
// Output: Hello, World!

Data processing

In addition to traditional control flow structures, such as "for", "while" and "if", Transd makes it possible to uniformly manipulate data — from list to multi-column table to collection of semi-structured records — with the help of three data queries: "select", "update" and "reduce".

#lang transd

MainModule: {
    v: ["apple", "orange", "avocado", "peach"],
    n: 0,

    _start: (lambda
    (with tabl Table()
        (load-table tabl tabdata)
        (build-index tabl "Salary")

        (with rows (tsd tabl 
                  select: ["Name", "Department"]
                  as: [[String(),String()]]
                  where: "Salary > 20000"
                  sortby: "Name")

             (for row in rows do (lout row)))
             // OUTPUT:
             // <= ["Bob", "Marketing"]
             // <= ["Susan", "HR"]

    (lout (tsd v :select satisfying: (λ s String() (starts-with s "a"))))
    // <= [["apple"], ["avocado"]]

    (tsd v :update set: (λ s String() (reverse s)))
    (lout v)
    // <= ["elppa", "egnaro", "odacova", "hcaep"]

    (tsd v reduce: ["(size col1)"] using: (λ i Int() (+= n i)))
    (lout n)
    // <= 23


Transd is resource-savvy. With the special attention that code and binary don't contain any overhead, Transd is made both small in physical size and simple to build on both platforms.

Building the Transd virtual compiler TREE3 ("Transd Expression Evaluator") on Linux:

$ cd transd/tree3/src
$ g++ -O2 -std=c++14 transd.cpp main.cpp -D__LINUX__ -lpthread -o tree3
$ ls --size --human-readable
total 3.0M
8.0K main.cpp  448K transd.cpp  176K transd.hpp  2.3M tree3
$ ./tree3
_) (textout "Hello, Transd!")
Hello, Transd!

The whole language compiles on Windows and Linux with Clang, without any prerequisites, using, basically, the same command:


$ clang++ -std=c++14 -O3 -D__LINUX__ -DNDEBUG transd.cpp main.cpp -lpthread -o tree3


PS clang++ -std=c++14 -O3 -DWIN32 -DNDEBUG transd.cpp main.cpp -o tree3.exe


Various parts of Transd have performance which currently varies from that of the fastest interpretive languages not using binary Just-In-Time compilation to the speed comparable or even close to native. For example, a SELECT query on a table with 100,000 records is performed in fractions of seconds: performance test.

Transd doesn't use Just-In-Time compilation to machine code, and already in beta version has a quite good performance. Which seems a sufficient reason for estimating that with passage of time it's performance will be between the native speed and the traditional interpretive model with virtual machines executing byte code.

Quick Survey of Transd Programming Language

What tasks can Transd solve?

Extending the functionality of programs, processing data in various formats, general programming/scripting, etc.

How Transd is used?

Transd can be integrated into a program as a two-file C++ library or can be bundled as a small executable file.

Why Transd?

Transd is a full-fledged programming language with many advanced features. It's cross-platform, fast, and very compact.

Examples of use cases for Transd

Text data processing

Transd out of the box can work with structured or semi-structured text data (CSV tables, JSON-like objects, etc.).

Moderately large data sets can be processed with Transd using a very small amount of code. A program consisting of several lines can read a 'CSV' file, or other data, into memory and perform data queries on it, similar to SQL queries.

An example fully working program:

#lang transd

MainModule: {
    tabfile: "/mnt/data/employees.csv",
    tabstr: "",

    _start: (λ 
//-- Read a CSV file into a string --
        (with fs FileStream()
            (open fs tabfile) (textin tabstr fs))

        (with tabl Table()
//-- Load table and build indexes --
            (load-table tabl tabstr)
            (build-index tabl "Age in Company (Years)")
            (build-index tabl "Salary")

//-- Do a SELECT query
            (with rows (tsd-query tabl 
                    select: ["Name Prefix", "First Name", "Last Name",
                             "Age in Company (Years)", "Salary"]
                    as: [[String(),String(),String(),Double(),Int()]]
                    where: "\"Age in Company (Years)\" > 35.0 AND 
                        Salary < 43000"
                    sortby: "Salary")
//-- Print result
                (for row in rows do (lout row)))


[Drs., Cameron, Diggs, 36.35, 40119]
[Mr., Cory, Coyle, 37.62, 41078]
[Mr., Carol, Vangundy, 36.59, 41724]
[Mrs., Kristi, Beliveau, 38.39, 41796]
[Ms., Particia, Blair, 35.06, 41819]
[Mr., Wilber, Ransome, 37.67, 41994]
[Ms., Cathern, Pettit, 36.36, 42453]
[Mr., Lamar, Parson, 35.41, 42458]

 table loading: 20.41 sec;
 running query: 0.006 sec.

This program reads a 37 column table with 100,000 rows of sample data and makes a SELECT query on it.

You can run this example by following instructions here.

Cross-platform scripting language

Transd can be used as a bundled/embedded cross-platform general-purpose programming language for performing various programming tasks in cases when other solutions are too big, or not cross-platform, etc.

An example of using Transd as a cross-platform general purpose language is a Transd interpreter: TREE3. The size of its executable on Windows is less than 3 Mb, on Linux - less than 5 Mb (statically linked).

Handling an advanced configuration

Cases of complex configuration may include definition of custom user classes, using types, variables and expressions in config files, and definition of custom user functions that can be executed by Transd.

Complex configuration is handled by associating with configuration files a custom handler, written in Transd. The amount of code in such handler usually is considerably less, than in one written on the program's native language, thanks to the built-in support of the format in which config files are written ("Transd Structured Data" (TSD) format).

An example of handling a moderately complex configuration can be seen in a demonstration program.

Extending the program's functionality

Transd can be used for creating a library of user-defined functions, extensions, add-ons, plugins.

With Transd as an extension language, the user is able to write their own functions from scratch or download a third party's add-on and run it either via interpreter, or internally, in case if Transd is included in the program as a C++ library.

An example of extending a graphics program with a rendering function written in Transd can be found here.

Handling a simple configuration file

Programs requiring simple configuration can use Transd for processing config files with minimum of coding. The configuration file(s) for such program can be organized in the form of named sections with lists of name/value items:

network : {
    siteName: "localhost",
    requestPassword: true

users : {
    usernames: ["alice", "bob", "tom"],
    userquotas: [35, 35, 30]

Such format has built-in support in Transd and can be processed with almost no customization. Values in name/value pairs can be strings, integers, floats, booleans, and lists of these. Transd reads and validates configuration files and returns the processed results via C++ API or standard output.