20 lines
1.6 KiB
Markdown
20 lines
1.6 KiB
Markdown
|
# The basic design concepts
|
||
|
|
||
|
The design is based around two fundamental (and related) concepts; *datasets* and *dimensions*. A dataset is a matrix with a list of identifiers for each row and another list of identifiers for each column. The dimension for rows and columns are also stored.
|
||
|
|
||
|
## Dimension
|
||
|
|
||
|
A dimension is just a name of a domain that your data contain. Each element along a dimension is identified by a name that is defined to be unique across every dataset, plot and other program elements that contains that dimension.
|
||
|
|
||
|
So, if we have a dimension named `samples`, which contains an identifier `patient1`, whenever this identifier is used in the `samples` dimension, it is assumed to refer to the same entity.
|
||
|
|
||
|
This allows the program to do mapping between different plots and datasets, so that when <`patient1` in the `samples` dimension> is selected in one plot, this selection can propagate to all other places in the program that displays some kind of information on samples.
|
||
|
|
||
|
## Dataset
|
||
|
|
||
|
A dataset is a matrix where both columns and rows are associated with dimension. For example, a gene analysis study may have a dataset where the rows are tissue samples associated with the `samples` dimension and the columns are all the measured genes in the `genes` dimension.
|
||
|
|
||
|
## Annotations
|
||
|
|
||
|
Sometimes we want additional information associated with the identifiers along a dimension for display purposes. A gene is often represented by an identifier that is not very meaningful without being looked up in a database. So if we also want some extra information, like the name of the gene, this is stored in *annotations* along the `genes` dimension.
|