Column paths¶

Link columns¶

A link column stores row identifiers of another table. Its purpose is to provide access to the data stored in another table. For example, a Persons table might store such attributes like name directly in this table while address information could be stored in another table called Addresses. In this case, the Persons table should have a column which references records of the Addresses table. For example, this link column could be called address. In contrast to other columns, its values are not used directly but rather are used to access values in the referenced table.

Column paths¶

If a table has a link column then it can be used to access columns in the referenced table. This is done by specifying a column path which is a sequence of link columns where each next column starts from the table where previous column ends. The last column in a column path is a normal table column or attribute the data of which has to be processed. Syntactically, Prosto uses double colon to separate segments in a column path. For example, in order to access a street column from the Persons table we write the following column path: address::street. This column path starts from the Persons table and ends in the Addresses table.

Defining link columns¶

Link columns are defined by means of the link operation. Given two tables, this operation will produce a new link column the values of which will reference the second (target) table from the first table. The criterion of matching is equality of the specified columns. For example, in Column-SQL a link column could be defined as follows:

ctx.column_sql("LINK  SourceTable(A, B) -> link_column -> TargetTable (A, B)")

Once this link column has been defined, it can be used in all operations where the SourceTable columns are used in the same way as simple columns, for example: link_column::C for accessing the C columns of the TargetTable.

Link columns are also automatically created by the project operation. This operation not only projects a source table by producing a new target table, but also creates a link between these two tables. For example, if the TargetTable was not available, then it could be created by projecting the SourceTable by finding all unique combinations of the A and B attributes:

ctx.column_sql("PROJECT  SourceTable(A, B) -> link_column -> TargetTable (A, B)")

Use of column paths¶

Column paths can be used where a computation needs to be done with some data in another table. For example, we might want to define a calculate column using data in a referenced table:

ctx.column_sql(
    "CALCULATE  Persons(position::salary) -> adjusted_salary",
    lambda x: x['position::salary'] * 1.1
)

Here the salary is stored in a table with all existing positions while Persons have a link to the position.

Link columns are used also in aggregate columns for grouping by assuming that all referencing one record means belonging to one group. For example, we could find mean age for each position as follows:

ctx.column_sql(
    "AGGREGATE  Persons (age) -> position -> Positions (mean_age)",
    lambda x: x.mean()
)