1.6 KiB
The basic design concepts
The design is based around two fundamental (and related) concepts; datasets and dimensions. A dataset is a matrix with a list of identifiers for each row and another list of identifiers for each column. The dimension for rows and columns are also stored.
Dimension
A dimension is just a name of a domain that your data contain. Each element along a dimension is identified by a name that is defined to be unique across every dataset, plot and other program elements that contains that dimension.
So, if we have a dimension named samples
, which contains an identifier patient1
, whenever this identifier is used in the samples
dimension, it is assumed to refer to the same entity.
This allows the program to do mapping between different plots and datasets, so that when <patient1
in the samples
dimension> is selected in one plot, this selection can propagate to all other places in the program that displays some kind of information on samples.
Dataset
A dataset is a matrix where both columns and rows are associated with dimension. For example, a gene analysis study may have a dataset where the rows are tissue samples associated with the samples
dimension and the columns are all the measured genes in the genes
dimension.
Annotations
Sometimes we want additional information associated with the identifiers along a dimension for display purposes. A gene is often represented by an identifier that is not very meaningful without being looked up in a database. So if we also want some extra information, like the name of the gene, this is stored in annotations along the genes
dimension.