Data importing


Introduction

Data importing is a mechanism, which factors out the program's code performing the translation of input data into the program's state, that is, storing the input data in program's variables, objects, etc. Those programming tasks, that need to handle static external data, be it program's configuration parameters or data representing a state of some external object or process, can benefit if programmed in OOP paradigm and then use the data importing.

Data importing helps to make programs smaller by removing the need in the code, implementing the logic of initialization objects with external data. Importing - means not only initializing objects, declared in the program code, with values, but that the whole hierarchy of objects used in a particular program run can be defined in an external data file without declaring these objects in the program code.

The mechanism of data importing in Transd is based on "Transd Structured Data" (TSD) data format.

TSD is a textual format for unified representation of various kinds of data. It represents data in the form of named blocks of name/value pairs (such blocks are called TSD objects):

"someObject_1" : {
    someString: "string1",
    someInteger: 25,
    someFloat: 1.23,
    someStrings: ["string2", "string3", "string4"],
    someIntegers: [30, 40, 50]
}

Transd has built-in support for quick assimilating the data from textual TSD objects into the program state.

Data importing works most efficiently when the task, performed by program, is described in terms of a hierarchy of objects. The "hierarchy" here means not the inheritance relations between object classes, where one class is a subclass of another, but a containment hierarchy, where one object is part of another object, whereas the classes of objects may be unrelated. When objects are composed into a hierarchy, it's sufficient to declare in the program only the top object of the hierarchy, and all the lower levels of contained objects will be instantiated automatically as needed.

Example

AFC (short for "Audio Flow Combiner") is an example of using a hierarchy of objects for performing the program task and transferring a state, defined in an external data file, into a program at runtime. The source code of the program can be found here. The program's manual is located here.

"Audio Flow Combiner" is an audio file player, which plays files by serially combining their fragments into one audio flow. E.g., if An is a fragment of file A, Bn - fragment of file B, X - silence (pause), then AFC can produce an audio flow which looks like: A1 X B1 X A2 X B2 .... The flow can be customized by several parameters: the number of files, the fragment length for each file, the frequency of appearing fragments in the flow, audio parameters (pitch, speed, volume, audio effects), etc. AFC uses the "SoX" program as the audio back-end.

The whole running of the program is organized as interaction of hierarchically related objects. The top object (Flow class) represents the whole flow. It contains one more stream objects (Stream class), each stream representing an audio file. A stream contains a reference to the object containing the description of how the file should be divided to fragments (Fragment class). The Fragment class is separated from Stream class for code reusing: different Stream objects can use the same Fragment object (or define their own). Each Fragment object contains references to one or more play objects (Play class) which contain audio parameters for playing a fragment, such as volume, playback speed, etc.

The program source code containes class definitions for objects in the hierarchy, some startup code, and the declaration of the top Flow object. The hierarchy of program objects is wholly defined in an external data file ("flowlist" file).

The transferring of the hierarchy of program objects from the data file into the program is done as follows. First, the whole data file is read in with the (read-tsd-file) built-in Transd function. This function reads TSD objects contained in the data file and returns a list with these objects (call it - flowlist). Then this flowlist is passed into the (init) method of the top Flow object. The top object finds its definition in the flowlist, inititalizes its data members from this definition with the help of (load-from-object) built-in function, including the list of Stream objects, instantiates a new Stream object for each element in this list, and then calls the (init) method of each newly instantiated Stream object, passing the reference to the flowlist as the argument. This process recursively repeats for each level of the hierarchy until the whole hierarchy is constructed.

After the construction is done, the hierarchy is started to function by calling the (play) method of the top Flow object.

In the conclusion it can be noted, that this particular program has resulted to be six times smaller in the source code size than its analog written some time ago in another high level language. If a program is designed from scratch using the OOP paradigm, then the data importing technic can bring big gains in reducing the code size and, occasionally, leveraging external tools in preparing the input dataset, thus creating a pipeline of data processing.