Using Transd for working with data

Transd's data processing capabilities include a built-in data query language (TQL) for working with datasets in a database way, and data deserializing functionality which makes it possible to create program objects and initialize them with data directly from text data files.

Transd Query Language

Transd Query Language (TQL) is a built-in tool in Transd for working with datasets of various types by making database-like queries for selecting and updating data. TQL can be used for working with the following dataset types:

  ID        Name       Salary 
 234   Edward Smith     60000
 235   Harriet Willers  70000
Device: Tablet
Screen: 10.1
Weight: 350

Device:   Notebook
RAM (Gb): 16
HD (Tb):  3 

Example of a TQL 'select' query:

(tql table1 
    select: ["Book_title", "Price"]
    as: [[String(), Double()]]
    where: "Price < 50"
    satisfying: (lambda Author String() (match Author ".*Lovecraft.*"))
    sortby: "Price" :ascending
    limit: 5 

Data importing

The idea of data importing (deserializing) is that objects of custom classes in the program can be created and initialized using data in some more simple or more universal format. It can be said that data is converted into objects. An example of such conversion is the deserializing of JSON data into program objects in such languages as Java, JavaScript, C#, etc.

Transd can help to make this conversion more automatic and reduce the amount of code for handling such coversions. In Transd data deserizalizing offers additional essential features such as the ability to use variables as field values, that refer to objects in the same data file, the ability to have objects of other classes as field values, and so on. For data deserializing Transd uses the Transd Structured Data (TSD) format, which resembles JSON, but has a number of differences.


Data processing

In the scenario for this example we have a file with tabular data (in CSV format) and we want to make some queries on this data. In the Transd program for this task the data is read into a table, and then we use TQL query language to select the needed information.

The data file contains data records with information about employees: name, ID, salary, department. We want to select two employees with highest salary for each department.

data file: data.csv

Tyler Bennett,  E10297,32000,D101
John Rappl,     E21437,47000,D050
George Woltman, E00127,53500,D101
Adam Smith,     E63535,18000,D202
Claire Buckman, E39876,27800,D202
David McClellan,E04242,41500,D101
Rich Holcomb,   E01234,49500,D202
Nathan Adams,   E41298,21900,D050
Richard Potter, E43128,15900,D101
David Motsinger,E27002,19250,D202
Tim Sampair,    E03033,27000,D101
Kim Arlich,     E10001,57000,D190
Timothy Grove,  E16398,29900,D190

program file

#lang transd

MainModule: {

tabfile: "data.csv",
tabstr: "",

_start: (λ 
  (with fs FileStream()
     (open-r fs tabfile) (textin from: fs tabstr)

  (with tabl Table()
      (load-table tabl tabstr :firstRowColNames)
      (build-index tabl "Department")
      (with rows (tql tabl 
          select: ["Department"] 
              as: [[String()]] 
          sortby: "Department" )

          (for row in rows do
                (with recs (tql tabl 
         select: all 
             as: [[String(), String(), Int(), String()]]
     satisfying: (lambda Department String() (eq Department (get row 0))) 
         sortby: "Salary" :desc 
          limit: 2)

               (for rec in recs do (textout rec "\n"))


//["John Rappl", "E21437", 47000, "D050"]
//["Nathan Adams", "E41298", 21900, "D050"]
//["George Woltman", "E00127", 53500, "D101"]
//["David McClellan", "E04242", 41500, "D101"]
//["Kim Arlich", "E10001", 57000, "D190"]
//["Timothy Grove", "E16398", 29900, "D190"]
//["Rich Holcomb", "E01234", 49500, "D202"]
//["Claire Buckman", "E39876", 27800, "D202"]

The code works as follows. First, the datafile is read into tabstr string. Then tabl object of Table class is created and is loaded with data from tabstr string with (load-table) method. After that, we build database index for the column we will use as the selection criterium: "Department". Finally, we accomplish the task by querying the table with TQL query language.

Data importing

Basic example

In this example we have a custom class Point and we create and initialize a vector of Point objects, using a vector of raw data.

#lang transd

class Point : {
    x: Double(),
    y: Double(),
    @init: (λ v Vector<Double>() (= x (get v 0)) (= y (get v 1))),
    print: (λ (textout "Point(" x "; " y ")" ))

MainModule: {
    v_: [[1.0, 2.0], [3.0, 4.0]],

    _start: (λ 
        (with v Vector<Point>(v_)
          (for p in v do (print p) (lout ""))
    )   )

// Point(1; 2)
// Point(3; 4)

Importing TSD objects

This example expands the previous one and gives the idea of using an external file with data in TSD format for data deserializing. In this example an object of Square class has as data members a string, and a vector of Point objects. Also, it has draw method.

TSD format, in general, is a list of name/value pairs, and in this it resembles JSON, but in other aspects it has a number of differences. In this example a TSD file contains a description of an object of Square class:

data file: data.tsd

"square1" : {
class: "Square",
color: "green",
sideLen: 5.0,
coors: [[1.0, 2.0], [3.0, 4.0]]

program file

#lang transd

class Point : {
    x: Double(),
    y: Double(),
    @init: (λ v Vector<Double>() (= x (get v 0)) (= y (get v 1))),
    print: (λ (textout "Point(" x "; " y ")" ))

class Square : {
    coors: Vector<Point>(),
    sideLen: Double(),
    color: String(),

    draw: (λ (lout "(Square::draw): \nColor: " color) 
            (textout "Side: " sideLen "\nCoors: ")
            (for pt in points do (print pt) (textout " " )))

MainModule: {

objFile: "data.tsd",
squareName: "square1",
squareObj: Square(),
objs: Index<String Vector<Object>>(),

_start: (λ 
    (rebind objs (group-by 
                   (read-tsd-file objFile) 
                   (λ ob Object() -> String() 
                     (get-String ob "name"))))
    (load-from-object squareObj (get (snd (get objs squareName)) 0))
    (draw squareObj)

// Output:
// (Square::draw): 
// Color: green
// Side: 5
// Coors: Point(1; 2) Point(3; 4)

The code works as follows. The contents of TSD file are read with the (read-tsd-file) function into a vector of TSD objects. Then the group-by method is called on this vector, which returns an index (Index is the name for associative arrays in Transd) of TSD objects indexed by their names. This index is assigned to objs variable.

Then the Square object squareObj gets initialized from a TSD object with the (load-from-object) function: the object named "square1" (the name is hold in the squareName variable) is retrieved from objs index and passed to the function along with squareObj.

After initialization of squareObj, the (draw) method is called on it.

Further examples

Using Transd for working with 3.7 million values table:

A demo program "AFC" (Audio Flow Combiner) uses data deserializing for importing an hierarchy of program objects from a configuration file.

An example program "Knorg" (Knowledge Organizer) works with loosely organized data.