Containers
Two new container types have been added: Set
and HashIndex
.
Data importing
Data importing functionality has been added. Data importing (or object deserializing) allows quick creation and initialization of program objects of custom classes using external textual data. An example of this feature is deserializing data in JSON format into custom class objects in some languages.
Data importing in Transd is similar to JSON object deserializng, but enhanced with some essential features, so that whole hierarchies of custom objects with complex structure can be loaded into the program from data files with minimal coding.
'locals:' keyword
The keyword locals:
placed just after the function signature in the lambda declaration introduces local function variables, whose scope is the function body.
Built-in methods
(String::strip) method removes from one or both ends of a string the specified characters. Example.
REDUCE data query
The data-processing functionality of the language has been expanded with the "REDUCE" data query type. Like other query types (SELECT and UPDATE) REDUCE works the same on large multi-column datasets and on one-dimensional vectors, which are regarded as one-column datasets.
#lang transd
MainModule: {
v: ["apple", "orange", "avocado", "peach"],
n: 0,
_start: (λ
(tsd v reduce: ["(size col1)"] using: (λ i Int() (+= n i)))
(textout n)
)
// Output:
// 23
}
Lambdas: capturing
Lambda type has been improved by adding the ability for lambda objects to capture
variables (including other lambdas) from the environment at the point of lambda
definition. Thus enabling lambdas to perform as closures.
#lang transd
MainModule: {
Lii: typealias(Lambda<Int Int>()),
la: Lii(λ z Int() (ret (+ z 100))),
makeclos1: (λ fn Lii()
(with clos Lii(λ [[fn]] i Int()
(textout (exec fn i)))
(ret clos)
) ),
_start: (λ
(with clos (makeclos1 la)
(exec clos 7) // prints 107
) )
}
More examples can be found in the reference test suit: Lambdas.
Accurate floating point calculations
The (incr)
and (decr)
methods were added to the Double
type for enabling
accurate calculations with interval arithmetic. For more information and for an example of
using these methods see here.
Built-in functions
Several built-in functions have been added into the core language.
(first)
takes any number of similarly typed arguments and returns the
first argument evaluating to an equivalent of 'true'.
A number method (within)
takes two numeric arguments and returns boolean value
depending on whether the subject is in the interval bounded by the arguments or not.
A container method (is-subset)
takes another container and returns boolean value
depending on whether that container is a subset of the subject (that is whether each
element of the argument has an identical counterpart in the subject, sorting ignored).
Examples of using these functions can be seen here and here. Or, as usually, they can be found in the test suit with the command:
grep -r <FUNC_NAME> <TEST_SUIT_DIRECTORY>
Lambdas
Transd supports anonymous function objects ("lambda" functions), which can be used in a great many of places, for example:
(with v ["C", "a", "D", "b", "A"]
(sort v (lambda l String() r String() -> Bool()
(ret (less (tolower l) (tolower r)))))
(lout v)) //<= [a, A, b, C, D]
Now, anonymous lambdas are supplemented with named variables of Lambda
type. Lambda
objects can be copied, passed as arguments to functions, returned from functions, etc. That is, they are just another data type.
(with lam Lambda<String Null>(λ s String()
(textout "Hello, " s "!") )
(exec lam "World") //<= Hello, World!
)
Pipeline calling order
Pipeline evaluation operator has been added. It makes possible to compose functions by concatenating function calls instead of nesting. For example, suppose, we have some string as data and we want to perform a sequence of operations on this data:
In usual way, this data processing flow can be arranged as follows:
(textout
(sort
(split someStr " ")))
The pipeline operator -|
allows us to avoid deep nesting and to write function calls in the same order as operations follow logically:
(-| (split someStr " ")
(sort)
(textout))
The necessary condition for combining function calls in this way is that the return value of a function must be admissible as an argument to the next function.
The inclusion of pipeline operator into language constructs, can make parentheses-grouping syntax arguably one of the most clean and clear:
Formatted output
Formatting capabilities of text streams have been expanded with the following manipulators: fill:, :left, :right, :internal, prec:, :fixed, :boolalpha, :noboolalpha
.
(with pr1 3.7 pr2 1.5
(lout width: 20 fill: "." :left "Fried chicken"
prec: 2 :fixed pr1 " USD" )
(lout width: 20 "Ice-cream" pr2 " USD" )
(lout width: 28 fill: "-" )
(lout width: 20 fill: " " :right "Total: " (+ pr1 pr2) " USD" )
)
OUTPUT:
Fried chicken.......3.70 USD
Ice-cream...........1.50 USD
----------------------------
Total: 5.20 USD
Container methods
Another method - (coincide)
- has been added to generic container methods. It takes two containers and returns the length of their common prefix (or postfix), that is the number of equal elements from the beginning (or the end) of the container.
(with v1 [0,1,2,3,4,5,6,7] v2 [0,1,2,3,4,4,5,6,7]
(textout (coincide v1 v2)) // <= 5
)
Expanding data processing capabilities
As the core of Transd has acquired its stable long-term shape, the emphasis of development has shifted to strengthening and expanding specialized parts of the language.
The data processing section has been supplemented with Table
class, and now includes the following classes:
Object
- represents a text block of name/value pairs, treated as a single object. Such objects in Transd are called "TSD objects" ("TSD" - is short for "Transd Structured Data").
TSDBase
- a collection of TSD objects, which can be viewed as an ad-hoc "NoSQL" database, and which supports "SELECT" and "UPDATE" data queries.
Table
- a class for working with tabular data (e.g. CSV files). Table
objects can be viewed as one-table databases, and they support "SELECT" and "UPDATE" data queries.
Performance increase
Some good amount of laborious optimizations and profiling has brought its positive results in the form of considerable performance gains in both low-level and high-level operations of the language. Results can be seen here.
Portability increase
With the addition of Clang to the list of building tools with which Transd is built seamlessly, the portability of the language reached its, in fact, ultimate level: on both platforms Transd can be compiled using, basically, the same command:
Linux:
$ clang++ -std=c++14 -O3 src/transd.cpp src/main.cpp -D__LINUX__ -lpthread -o tree3
Windows:
PS clang++ -std=c++14 -O3 src/transd.cpp src/main.cpp -DWIN32 -o tree3
(tsd-query) : UPDATE
Data query functionality has been expanded with the "UPDATE" query type. An example of using the UPDATE query can be seen here: Merge aggregate datasets.
DateTime type
Another type was added to the type system: DateTime
. Among other uses, this type is indispensable in data processing. An example of usage is via the link in the previous paragraph.
Example Transd program
Audio Flow Combiner, written in Transd, has reached the first level of maturity. It plays smoothly for hours most intricate and interwoven flows. The memory usage is astoundingly low (around 5 Mb of commited memory on Linux).
For demonstration purposes a first program on Transd has been created: "Audio Flow Combiner". This program illustrates one way of using Transd as a front-end language. Serving as a front-end for the popular and venerable "SoX" audio program, AFC can create finely grained audio flows from one or several audio files.
This program demonstrates many features of the language and main principles of structuring a Transd program. Its source code can be used as a tutorial in Transd programming, and as a reference for the particular task of scripting program's behaviour through the command line. The analysis of the program's source code can be found here.
A new Object
type has been added to the type system. This type represents a Transd Structured Data (TSD) object: a named block of text data structured in the form of name/value pairs:
"order_255" : {
"Orange Juice": 0.95,
"Lunch Herb Crusted Salmon": 3.95,
"Orange Chicken": 1.95,
"Side of French Fries": 0.95,
total: 7.80
}
A text file with many such objects can be processed with Transd in the following way:
objs: Index<String Vector<Object>>(),
(rebind objs (group-by
(read-tsd-file "restaurant_orders")
(λ ob Object() -> String()
(get-String ob "name"))))
And in objs
variable we have an Index with TSD objects, addressable by name (e.g. "order_255") and ready to be placed into a TSDBase or processed in some other way.
What gives this feature much more power is that it can be used for initializing Transd objects with text data.
E.g. we have a program that defines several classes and uses objects of those classes. Then we can initialize these objects from a single text file, which can play the role of a database, advanced configuration file, etc. Our program can look like this:
class ClassA : {
field1: String(),
field2: Int(),
meth1: (lambda ... )
}
class ClassB : {
field1: vector<String>(),
field2: Double(),
meth1: (lambda ... )
}
MainModule: {
objs: Index<String Vector<Object>>(),
objA_1: ClassA(),
objA_2: ClassA(),
objB: ClassB(),
_start: (λ (rebind objs (group-by
(read-tsd-file "database1")
(λ ob Object() -> String()
(get-String ob "name"))))
(load-from-object objA_1 (get (snd (get objs "objA_1")) 0))
(load-from-object objA_2 (get (snd (get objs "objA_2")) 0))
(load-from-object objB (get (snd (get objs "objB")) 0))
...
And the program's data file will look like this:
"objA_1": {
class: "ClassA",
field1: "string1",
field2: 25
}
"objA_2": {
class: "ClassA",
field1: "string2",
field2: 37
}
"objB": {
class: "ClassB",
field1: ["string3", "string4", "string5"],
field2: 14.1
}
Thus, with addition of TSD Object type, it's now possible in Transd to implement with minimum of code the chains "TEXT_DATA --> TSD Object --> TSDBase database". That is, with a small amount of code we can create objects of custom structures, read them in from a text file, and work with them in TSDBase in a database way: sorting, quering, selecting, etc.
Type system has become much closer to the production shape. All fundamental types are of fixed size now, instead of platform-dependent sizes, when long int
on Windows was 4 bytes and on Linux - 8 bytes. Strings remain platform dependent, since wchar_t
is irreplaceable for Unicode handling, and its size is different on Windows and Linux.
The built-in types are:
Byte - unsigned, 1 byte
Int - signed, 4 bytes
Long - signed, 8 bytes
ULong - unsigned, 8 bytes
String - UTF-16 on Windows, UTF-32 on Linux
ByteArray - container for native (unboxed) unsigned bytes
Vector<> - generic sequence container
Index<> - generic associative array
HashIndex<> - generic hash table
In other news - a new Rosetta task has been implemented, where the first usage of formatted output, mentioned in the previous blog entry, can be seen (as well as the new type system features): AKS test for primes
As release is nearing, the design decisions that were postponed until later stages are being made. The conception of how to do the formatted output in Transd has been detalized.
I decided to go with the C++ model, which uses stream manipulators for output formatting, since the concept of manipulators ideally aligns with already existing Transd's markers. Manipulators define how the nearest item after them should be formatted. So, the Transd code for formatted output of, e.g., a string and a double can look like this:
(textout width: 6 "Pi:" :sign prec: 4 PI)
Which will produce the output:
Pi: +3.1416
This illustrates how important it is to make design decisions in their proper time. Markers appeared relatively late in the language, and if on early stages some other model were chosen for formatted output (e.g. Python's), then it certainly would not look and act so uniform with the rest of the syntax. Compare the above example with a call of 'substr' function:
(substr s from: after: last: "/" to: last: ".")
Containers received another upgrade: now Transd supports the pipeline semantics for staged data processing.
An example of pipelined data processing is an implementation of "Anagrams" Rosetta task. The contents of an English dictionary is read to the 'words' string, then this data is passed through several stages of processing, and as a result the list of words with the maximum number of anagrams is outputted:
#lang transd
MainModule: {
_start: (λ
(with fs FileStream() words String()
(open fs "/mnt/proj/tmp/unixdict.txt")
(textin words fs)
(textout
(snd (max-element
(regroup-by
(group-by
(split words)
(λ s String() -> String() (sort s)))
(λ v Vector<String>() -> Int() (size v))))))))
}
Output:
[[abel, able, bale, bela, elba],
[caret, carte, cater, crate, trace],
[angel, angle, galen, glean, lange],
[alger, glare, lager, large, regal],
[elan, lane, lean, lena, neal],
[evil, levi, live, veil, vile]]
Transd now supports three types of streams:
StringStream - for Unicode text data;
ByteStream - for raw bytes;
FileStream - file I/O.
All stream types work in a uniform way, support automatic conversion between strings and bytes, and can serve as source or destination for data. For example, the following call:
(textout to: MyStream "Some text")
can output text to any of three stream types as well as to StdOut. For additional examples see the "Type system/Streams" folder in the test suit.
Another new feature was added: type aliases. Type aliasing creates an additional name for a data type without creating a new type. It is used for simplifying the syntax of declaring complex compound types or for providing descriptive names to types in specific context.
Example:
#lang transd
MainModule: {
Tuis : typealias( Tuple( Int() String() ) ),
v1: Vector( Tuis() ),
uv1: [[6, "a"]],
uv2: [[1, "c"]],
uv3: [[3, "h"]],
uv4: [[2, "e"]],
_start: (λ (with abc 1)
(add v uv1) (add v uv2)(add v uv3)(add v uv4)
(set v1 0 uv1) (set v1 1 uv2)(set v1 2 uv3)(set v1 3 uv4)
(textout "v: " v "\n")
(textout "v1: " v1 "\n")
(textout (sort v :asc
(lambda l Tuis() r Tuis() -> Bool()
(ret (less<Int> (get l 0) (get r 0))))) "\n")
)
}