Reference

magicsoup.world

This module defines the World class. A World is the main API for controlling a simulation. It stores a simulation's state and provides methods for advancing a simulation. World itself is a faccade that uses other classes such as Kinetics and Genetics.

World(chemistry, map_size=128, abs_temp=310.0, mol_map_init='randn', start_codons=('TTG', 'GTG', 'ATG'), stop_codons=('TGA', 'TAG', 'TAA'), device='cpu', batch_size=None)

The object of this class is the main API for running the simulation. It holds the state of a simulation and offers methods for advancing it.

Parameters:
  • chemistry (Chemistry) –

    Chemistry with Molecules and reactions.

  • map_size (int, default: 128 ) –

    Size of the 2D world map as number of pixels in x- and y-direction.

  • abs_temp (float, default: 310.0 ) –

    Absolute temperature in Kelvin. Will influence reaction equilibriums. Higher temperatures give concentration gradients higher importance.

  • mol_map_init (str, default: 'randn' ) –

    How to give initial molecule concentrations to molecule maps (randn or zeros). randn is normally distributed N(10, 1), zeros is all zeros.

  • start_codons (tuple[str, ...], default: ('TTG', 'GTG', 'ATG') ) –

    Codons which start a coding sequence

  • stop_codons (tuple[str, ...], default: ('TGA', 'TAG', 'TAA') ) –

    Codons which stop a coding sequence

  • device (str, default: 'cpu' ) –

    Device to use for tensors (see pytorch CUDA semantics). Use this to move most calculations to a GPU.

  • batch_size (int | None, default: None ) –

    Optional parameter for reducing memory peaks when cell parameters are updated. By iteratively calculating cell paremeters of maximum batch_size cells at once, one can reduce memory peaks during functions that update cell parameters.

Most attributes on this class describe the current state of molecules and cells. Both cells and molecules are referred to by index. Whenever cells are listed or represented in one dimension they are ordered consistently. E.g. at any point in the simulation world.cell_genomes[i] and world.cell_labels[i] refer to the genome and label of cell with index i. As cell number grows and shrinks cell indexes are not constant throughout the simulation. Whenever an operation modifies the number of cells (like kill_cells() or divide_cells()) cell indexes are updated. Likewise, molecules are always ordered as in Chemistry. E.g. world.chemistry.molecules[i] refers to the ith Molecule object and world.molecule_map[i] refers to a 2D map describing molecule concentrations of the ith molecule. Molecule indexes stay constant throughout a simulation.

The simulation works with lists and PyTorch Tensor representations of cell proteomes and molecule concentrations. You can get a Cell object from get_cell() with easy-to-interpret attributes (see Cell docs). However, for performance reasons only the lists and tensor representations are updated during a simulation. get_cell() is intended for analyzing simulation results.

For customizing a simulation one can interact with the working lists and tensors of the simulation. They could be used for reading cell and molecule information, or for modifying it. The following attributes are always updated during the simulation:

  • cell_genomes: A list of strings representing cell genomes ordered by current cell index.
  • labels: List of strings representing cell labels ordered by current cell index. Labels are strings that can be used to track cell origins. When spawning new cells (spawn_cell()) a random label is assigned to each cell. If a cell divides, its descendants get the same label.
  • cell_map: Boolean 2D tensor describing which pixels on the map are occupied by a cell. Dimension 0 represents the x, dimension 1 the y direction.
  • molecule_map: Float 3D tensor describing concentrations (in mM by default) for each Molecule. Dimension 0 represents the Molecule ordered as in Chemistry. Dimension 1 represents the x, dimension 2 the y direction. So, world.molecule_map[0, 1, 2] is the concentration of the 0th Molecule on pixel of the world map.
  • cell_molecules: Float 2D tensor describing intracellular concentrations (in mM by default) for each Molecule. Dimension 0 represents the cell ordered by current cell index. Dimension 1 represents the Molecule ordered as in Chemistry. So, world.cell_molecules[0, 1] represents the intracellular concentration in mM of the 1st Molecule in the cell which currently has index 0.
  • cell_lifetimes: Integer 1D tensor describing how many time steps each cell survived since it was spawned or since its last division ordered by current cell index.
  • cell_divisions: Integer 1D tensor describing how many times each cell's ancestors divided ordered by current cell index.

Some methods are provided to advance the simulation and interact with it:

Furthermore, there are methods for saving and loading a simulation. For any new simulation use save() once to save the whole world object to a pickle file (including its Chemistry, Genetics, Kinetics). You can restore it with from_file() later on. Then, to save the world's state you can use save_state(). This is a quick, lightweight save, but it only saves things that change during the simulation. Use load_state() to re-load a certain state.

PyTorch is heavily used in this simulation. Some important attributes are PyTorch Tensors (see above). By setting the device parameter most calculations can be moved to a GPU. This heavily increases performance and is recommended. When working with tensors one should try to use tensor methods. That way computations and data stay on the GPU. Here is a tutorial for working with tensors. Try to avoid sending data back and forth between CPU and GPU.

World object carries a Genetics instance on world.genetics, a Kinetics instance on world.kinetics, and a Chemistry instance on world.chemistry (which is just a reference to the Chemistry object that was used when initializing World). Usually, you don't need to touch them. But, if you want to override them or look into some details, see the docstrings of their classes for more information.

By default in this simulation molecule numbers can be thought of as being in mM, time steps in s, energies J/mol. Eventually, they are just numbers and can be interpreted as anything. However, with the default parameters in Kinetics and the way Molecule objects are defined it makes sense to interpret them in mM, s, and J.

get_cell(by_idx=None, by_position=None)

Get a Cell object with information about its current environment. Raises ValueError if cell was not found.

Parameters:
  • by_idx (int | None, default: None ) –

    Get cell by current cell index in World

  • by_position (tuple[int, int] | None, default: None ) –

    Get cell by position (x, y) on the world map.

Returns:
  • Cell

    The searched cell described as Cell object.

The simulation works with tensor representations of cell proteomes and molecule concentrations. The return value of this function is a helper which makes interpreting a cell easier. See the documentation of Cell for more details.

get_neighbors(cell_idxs, nghbr_idxs=None)

For each cell from a list of cell indexes find all other cells that are in the Moore's neighborhood of this cell.

Parameters:
  • cell_idxs (list[int]) –

    Indexes of cells for which to find neighbors

  • nghbr_idxs (list[int] | None, default: None ) –

    Optional list of cells regarded as neighbors. If None (default) each cell in cell_idxs can form a pair with any other neighboring cell in cell_idxs. With nghbr_idxs provided each cell in cell_idxs can only form a pair with a neighboring cell that is in nghbr_idxs.

Returns:
  • list[tuple[int, int]]

    List of tuples of cell indexes for each unique neighboring pair.

Returned neighbors are unique. So, e.g. return value [(1, 4)] describes the neighbors of a cell with index 1 and a cell with index 4. (4, 1) is not returned.

spawn_cells(genomes)

Create new cells and place them randomly on the map. All lists and tensors that reference cells will be updated.

Parameters:
  • genomes (list[str]) –

    List of genome strings for newly added cells

Returns:
  • list[int]

    Cell indexes of successfully added cells.

Each cell is placed randomly on the map and receives half the molecules of the pixel where it is added. If too few unoccupied pixels are left on the cell map only the remaining unoccupied pixels will be filled with new cells. For each new cell all cell parameters are updated. Each new cell recieves a lifetime of 0, 0 divisions, and a random label.

add_cells(cells)

Place Cells randomly on the map. All lists and tensors that reference cells will be updated.

Parameters:
  • cells (list[Cell]) –

    List of Cell objects to be added

Returns:
  • list[int]

    The indexes of successfully added cells.

Each cell will be placed randomly on the map. If too few unoccupied pixels are left on the cell map only the remaining unoccupied pixels will be filled with cells. Added cells keep their genomes, proteomes, intracellular molecules, lifetimes, divisions, labels. Their indexes and positions are updated.

divide_cells(cell_idxs)

Bring cells to divide. All lists and tensors that reference cells will be updated.

Parameters:
  • cell_idxs (list[int]) –

    Cell indexes of the cells that should divide.

Returns:
  • list[tuple[int, int]]

    List of tuples (anc_idx, new_idx) of successfully divided cells

  • list[tuple[int, int]]

    where anc_idx is the ancestor cell index and new_idx

  • list[tuple[int, int]]

    is the newly created cell index.

Each cell divides by creating a clone of itself on a random pixel next to itself (Moore's neighborhood). If every pixel in its Moore's neighborhood is already occupied, it will not be able to divide. Both the original ancestor cell and the newly placed cell will become the descendants. They will have the same genome, proteome, and label. Molecule of the ancestor cell are shared evenly among descendants.

Both descendants get the ancestor's number of divisions + 1. So, e.g. if a cell with n_divisions=2 divides, its descendants both have n_divisions=3. Both descendants recieve a new lifetime of 0.

Of the list of descendant index tuples, the first descendant in each tuple is the cell that still lives on the same pixel. The second descendant in that tuple is the cell that was newly created.

update_cells(genome_idx_pairs)

Update existing cells with new genomes and derive cell proteomes for them.

Parameters:
  • genome_idx_pairs (list[tuple[str, int]]) –

    List of tuples (genome, idx) where genome is the cells genome and idx is its current index.

kill_cells(cell_idxs=None)

Remove existing cells. All lists and tensors referencing cells are updated.

Parameters:
  • cell_idxs (list[int] | None, default: None ) –

    Indexes of the cells that should die. None to kill all cells.

Cells that are killed dump their molecule contents onto the pixel they used to live on.

Cells are removed from all lists and tensors that reference cells. Thus, after killing cells the index of some living cells are updated. E.g. if there are 10 cells and you kill the cell with index 8 (the 9th cell), the cell that used to have index 9 (10th cell) will now have index 9.

move_cells(cell_idxs=None)

Move cells to a random position in their Moore's neighborhood. If every pixel in the cells' Moore neighborhood is occupied already the cell will not be moved.

Parameters:
  • cell_idxs (list[int] | None, default: None ) –

    Indexes of cells that should be moved. None to move all cells.

world.cell_positions and world.cell_map are updated.

reposition_cells(cell_idxs=None)

Reposition cells randomly on cell map without changing them.

Parameters:
  • cell_idxs (list[int] | None, default: None ) –

    Indexes of cells that should be repositioned. None to reposition all cells.

Cells are removed from their current pixels on the cell map and repositioned randomly on any free pixel on the cell map. world.cell_positions and world.cell_map are updated. I.e. genomes, proteomes, cell molecules stay the same.

enzymatic_activity()

Catalyze reactions for one time step (1s by default). This includes molecule transport into or out of the cell. world.molecule_map and world.cell_molecules are updated.

diffuse_molecules()

Let molecules in molecule map diffuse and permeate through cell membranes by one time step (1s by default). world.molecule_map and world.cell_molecules are updated.

By how much each molcule species diffuses around the world map and permeates into or out of cells is defined on Molecule in Chemistry.

degrade_molecules()

Degrade molecules in world map and cells by one time step. world.molecule_map and world.cell_molecules are updated.

How quickly each molecule species degrades depends on its half life which is defined on Molecule in Chemistry.

increment_cell_lifetimes()

Increment world.cell_lifetimes by 1. Cell lifetimes can be used to filter cells but, apart from what you implement, they don't have any effect in the simulation.

mutate_cells(cell_idxs=None, p=1e-06, p_indel=0.4, p_del=0.66)

Mutate cells with point mutations

Parameters:
  • cell_idxs (list[int] | None, default: None ) –

    Indexes of cells which are allowed to mutate. Leave None to allow all cells to mutate.

  • p (float, default: 1e-06 ) –

    Probability of a mutation per base pair

  • p_indel (float, default: 0.4 ) –

    Probability of any point mutation being an indel (inverse probability of it being a substitution)

  • p_del (float, default: 0.66 ) –

    Probability of any indel being a deletion (inverse probability of it being an insertion)

Convenience function that uses mutations.point_mutations() to mutate genomes, then updates cell parameters for cells whose genomes were changed.

recombinate_cells(cell_idxs=None, p=1e-07)

Recombinate neighbourig cells

Parameters:
  • cell_idxs (list[int] | None, default: None ) –

    Indexes of cells which are allowed to recombinate. Leave None to allow all cells to recombinate.

  • p (float, default: 1e-07 ) –

    Probability of a strand break per base pair during recombinataion.

Convenience function that uses mutations.recombinations() to recombinate genomes of neighbouring cells, then updates cell parameters for cells whose genomes were changed. get_neighbors() is used to identify neighbors.

save(rundir, name='world.pkl')

Write world object to pickle file

Parameters:
  • rundir (Path) –

    Directory of the pickle file

  • name (str, default: 'world.pkl' ) –

    Name of the pickle file

This is a big and slow save. It saves everything needed to restore world with its chemistry and genetics. Use from_file() to restore it. For a small and quick save use save_state().

from_file(rundir, name='world.pkl', device=None) classmethod

Restore previously saved world object from pickle file. It had to be saved with save().

Parameters:
  • rundir (Path) –

    Directory of the pickle file

  • name (str, default: 'world.pkl' ) –

    Name of the pickle file

  • device (str | None, default: None ) –

    Optionally set device to which tensors should be loaded. Default is the device they had when they were saved. (see pytorch CUDA semantics)

Returns:

save_state(statedir)

Save current state

Parameters:
  • statedir (Path) –

    Directory to store files in (there are multiple files per state)

This is a small and quick save. It only saves things which change during the simulation. Restore a state with load_state().

To restore a whole world object you need to save it once with save(). Then, world.save_state() can be used to save different states of that world object.

load_state(statedir, ignore_cell_params=False)

Load a saved world state. The state had to be saved with save_state() previously.

Parameters:
  • statedir (Path) –

    Directory that contains all files of that state

  • ignore_cell_params (bool, default: False ) –

    Whether to not update cell parameters as well. If you are only interested in the cells' genomes, molecules, or labels you can set this to True to make loading a lot faster.

magicsoup.containers

This module defines some data classes. Chemistry is the most important one. A Chemistry object is needed when initializing World. It describes Molecules and reactions of a simulation. Molecule defines a molecule species and each reaction is a grouping of molecule species.

Molecule(name, energy, half_life=100000, diffusivity=0.1, permeability=0.0)

Represents a molecule species which is part of the world, can diffuse, degrade, and be converted into other molecules.

Parameters:
  • name (str) –

    Used to uniquely identify this molecule species.

  • energy (float) –

    Energy for 1 mol of this molecule species (in J). The hypothetical amount of energy that is released if the molecule would be fully deconstructed.

  • half_life (int, default: 100000 ) –

    Half life of this molecule species in time steps (s by default). Molecules degrade by one step if you call degrade_molecules(). Must be > 0.0.

  • diffusivity (float, default: 0.1 ) –

    A measure for how quick this molecule species diffuses over the molecule map during each time step (s by default). Molecules diffuse when calling diffuse_molecules(). 0.0 means it doesn't diffuse at all. 1.0 means it is spread out equally around its Moore's neighborhood within one time step.

  • permeability (float, default: 0.0 ) –

    A measure for how quick this molecule species permeates cell membranes during each time step (s by default). Molecules permeate cell membranes when calling diffuse_molecules(). 0.0 means it can't permeate cell membranes. 1.0 means it spreads equally between cell and molecule map pixel within one time step.

Each molecule species which is supposed to be unique should have a unique name. In fact, if you initialize a molecule with the same name multiple times, only one instance of this molecule will be created. It allows you to define overlapping chemistries without creating multiple molecule instances of the same molecule species. You can also get a molecule instance from its name alone (Molecule.from_name()). However, this also means that if 2 molecules have the same name all attributes must match.

atp = Molecule("ATP", 10)
atp2 = Molecule("ATP", 10)
assert atp is atp2

atp3 = Molecule.from_name("ATP")
assert atp3 is atp

Molecule("ATP", 20)  # error because energy is different

By default in this simulation molecule numbers can be thought of as being in mM, time steps in s, energies in J/mol. Eventually, they are just numbers and can be interpreted as anything. However, together with the default parameters in Kinetics it makes sense to interpret them in mM, s, and J.

Molecule half life should represent the half life if the molecule is not actively deconstructed by a protein. Molecules degrade by one step whenever you call degrade_molecules(). You can setup the simulation to always call degrade_molecules() whenever a time step is finished.

Molecular diffusion in the 2D molecule map happens whenever you call diffuse_molecules(). The molecule map is world.molecule_map (see World). It is a 3D tensor where dimension 0 represents all molecule species of the simulation. They are ordered in the same way the attribute molecules is ordered in the Chemistry you defined. Dimension 1 represents x-positions and dimension 2 y-positions. Diffusion is implemented as a 2D convolution over the x-y tensor for each molecule species. This convolution has a 9x9 kernel. So, it alters the Moore's neighborhood of each pixel. How much of the center pixel's molecules are allowed to diffuse to the surrounding 8 pixels is defined by diffusivity. diffusivity is the ratio a/b when a is the amount of molecules diffusing to each of the 8 surrounding pixels, and b is the amount of molecules that stays on the center pixel. Thus, diffusivity=1.0 means all molecules of the center pixel are spread equally across the 9 pixels.

Molecules permeating cell membranes also happens with diffuse_molecules(). Cell molecules are defined in world.cell_molecules (see World). It is a 2D tensor where dimension 0 represents all cells and dimension 1 represents all molecule species. Again, molecule species are ordered in the same way the attribute molecules is ordered in the Chemistry you defined. Dimension 0 always changes its length depedning on how cells replicate or die. The cell index (cell.idx of Cell) for any cell equals the index in world.cell_molecules. E.g. the amount of molecule species currently in cell with index 100 is defined as world.cell_molecules[100]. permeability defines how much molecules can permeate from world.molecule_map into world.cell_molecules. Each cell lives on a certain pixel with x- and y-position. And although there are already molecules on this pixel, the cell has its own molecules. You could imagine the cell as a bag of molecule hovering over that pixel. permeability allows molecules from that pixel in the molecule map to permeate into the cell that lives on that pixel (and vice versa). So e.g. if cell 100 lives on pixel 12, 450 Molecules would be allowed to move from world.molecule_map[:, 12, 450] to world.cell_molecules[100, :]. Again, this happens separately for every molecule species depending on the permeability value. Specifically, permeability is the ratio of molecules that can permeate into the cell and the molecules that stay outside. Thus, a value of 1.0 means within one time step this molecule species spreads evenly between molecule map and cell.

from_name(name) classmethod

Get Molecule instance from its name (if it has already been defined)

Chemistry(molecules, reactions)

Represents the chemistry with molecules and reactions available in a simulation.

Parameters:
  • molecules (list[Molecule]) –

    List of all Molecules that are part of this simulation

  • reactions (list[tuple[list[Molecule], list[Molecule]]]) –

    List of all possible reactions in this simulation as a list of tuples: (substrates, products) where both substrates and products are lists of Molecules. All reactions can happen in both directions (left to right or vice versa).

molecules should include at least all molecule species that are mentioned in reactions. But it is possible to define more molecule species. Cells can use molecule species in transporer and regulatory domains, even if they are not included in any reaction. For stoichiometric coefficients > 1, list the molecule species multiple times. E.g. for 2A + B <-> C use reaction=([A, A, B], [C]) when A,B,C are molecule A, B, C instances.

Duplicate reactions and molecules will be removed on initialization. As any reaction can take place in both directions, it is not necessary to define both directions. You can use __and__ to combine multiple chemistries:

both = chemistry1 & chemistry2  # union of molecules and reactions

The chemistry object is used by World to know what molecule species exist. Reactions and molecule species are used to set up the world and create mappings for domains. On the world object there are some tensors that refer to molecule species (e.g. world.molecule_map). The molecule ordering in such tensors is always the same as the ordering in chemistry.molecules. E.g. if chemistry.molecules[2] is pyruvate, world.molecule_map[2] refers to pyruvate concentrations on the world molecule map. For convenience mappings are provided on this object:

  • chemistry.mol_2_idx to map a Molecule object to its index
  • chemistry.molname_2_idx to map a Molecule name string to its index

DomainType

Protocol for domains

CatalyticDomain(reaction, km, vmax, start, end)

Object describing a catalytic domain.

Parameters:
  • reaction (tuple[list[Molecule], list[Molecule]]) –

    Tuple (substrates, products) where both substrates and products are lists of Molecules which are involved in the reaction. Stoichiometric coefficients > 1 are defined by listing molecules multiple times.

  • km (float) –

    Michaelis Menten constant of the reaction (in mM).

  • vmax (float) –

    Maximum velocity of the reaction (in mmol/s).

  • start (int) –

    Domain start on the CDS

  • end (int) –

    Domain end on the CDS

The simulation works with tensor representations of cell proteomes. This class is a helper which makes interpreting a cell's proteome easier. You should not instantiate this class but instead get it from cell.proteome (see Cell).

Domain start and end describe the slice of the CDS python string. I.e. the index starts with 0, start is included, end is excluded. E.g. start=3, end=18 starts with the 4th and ends with the 18th base pair on the CDS.

to_dict()

Get dict representation of domain

from_dict(dct) classmethod

Convencience method for creating an instance from a dict. All parameters must be present as keys. Molecules are provided by their name.

TransporterDomain(molecule, km, vmax, is_exporter, start, end)

Object describing a transporter domain.

Parameters:
  • molecule (Molecule) –

    Molecule which can be transported by this domain.

  • km (float) –

    Michaelis Menten constant of the transport (in mM).

  • vmax (float) –

    Maximum velocity of the transport (in mmol/s).

  • is_exporter (bool) –

    Whether the transporter is exporting this molecule species out of the cell.

  • start (int) –

    Domain start on the CDS

  • end (int) –

    Domain end on the CDS

The simulation works with tensor representations of cell proteomes. This class is a helper which makes interpreting a cell's proteome easier. You should not instantiate this class but instead get it from cell.proteome (see Cell).

is_exporter is only relevant in combination with other domains on the same protein. It defines in which transport direction this domain is energetically coupled with others.

Domain start and end describe the slice of the CDS python string. I.e. the index starts with 0, start is included, end is excluded. E.g. start=3, end=18 starts with the 4th and ends with the 18th base pair on the CDS.

to_dict()

Get dict representation of domain

from_dict(dct) classmethod

Convencience method for creating an instance from a dict. All parameters must be present as keys. Molecules are provided by their name.

RegulatoryDomain(effector, hill, km, is_inhibiting, is_transmembrane, start, end)

Object describing a regulatory domain.

Parameters:
  • effector (Molecule) –

    Effector Molecules

  • hill (int) –

    Hill coefficient describing degree of cooperativity

  • km (float) –

    Ligand concentration producing half occupation (in mM)

  • is_inhibiting (bool) –

    Whether this is an inhibiting regulatory domain (otherwise activating).

  • is_transmembrane (bool) –

    Whether this is also a transmembrane domain. If true, the domain will react to extracellular molecules instead of intracellular ones.

  • start (int) –

    Domain start on the CDS

  • end (int) –

    Domain end on the CDS

The simulation works with tensor representations of cell proteomes. This class is a helper which makes interpreting a cell's proteome easier. You should not instantiate this class but instead get it from cell.proteome (see Cell).

Domain start and end describe the slice of the CDS python string. I.e. the index starts with 0, start is included, end is excluded. E.g. start=3, end=18 starts with the 4th and ends with the 18th base pair on the CDS.

to_dict()

Get dict representation of domain

from_dict(dct) classmethod

Convencience method for creating an instance from a dict. All parameters must be present as keys. Molecules are provided by their name.

Protein(domains, cds_start, cds_end, is_fwd)

Object describing a protein.

Parameters:
  • domains (list[DomainType]) –

    All domains of the protein as a list of CatalyticDomains, TransporterDomains, and RegulatoryDomains.

  • cds_start (int) –

    Start coordinate of its coding region

  • cds_end (int) –

    End coordinate of its coding region

  • is_fwd (bool) –

    Whether its CDS is in the forward or reverse-complement of the genome.

The simulation works with tensor representations of cell proteomes. This class is a helper which makes interpreting a cell's proteome easier. You should not instantiate this class but instead get it from cell.proteome (see Cell).

CDS start and end describe the slice of the genome python string. I.e. the index starts with 0, start is included, end is excluded. E.g. cds_start=2, cds_end=31 starts with the 3rd and ends with the 31st base pair on the genome.

is_fwd describes whether the CDS is found on the forward (hypothetical 5'-3') or the reverse-complement (hypothetical 3'-5') side of the genome. cds_start and cds_end always describe the parsing direction / the direction of the hypothetical transcriptase. So, if you want to visualize a is_fwd=False CDS on the genome in 5'-3' direction you have to do n - cds_start and n - cds_stop if n is the genome length.

to_dict()

Get dict representation of protein

from_dict(dct) classmethod

Create Protein instance from dict. Key must match arguments. Domains are set as a list of dicts {"type": ".", "spec": {}) where type is domain type "C" (catalytic), "T" (transporter), or "R" (regulatory) and spec is a dict with kwargs for each domain's from_dict().

Cell(world, genome, position=(-1, -1), idx=-1, label='C', n_steps_alive=0, n_divisions=0, proteome=None, int_molecules=None, ext_molecules=None)

Object describing a cell and its environment.

Parameters:
  • world (World) –

    Reference to the origin World object.

  • genome (str) –

    Genome sequence string of this cell.

  • position (tuple[int, int], default: (-1, -1) ) –

    Position on the cell map as tuple (x, y).

  • idx (int, default: -1 ) –

    Current cell index in World.

  • label (str, default: 'C' ) –

    Label of origin which can be used to track cells.

  • n_steps_alive (int, default: 0 ) –

    Number of time steps this cell has lived since last division.

  • n_divisions (int, default: 0 ) –

    Number of times this cell's ancestors already divided.

  • proteome (list[Protein] | None, default: None ) –

    List of Protein objects.

  • int_molecules (Tensor | None, default: None ) –

    Intracellular molecule concentrations. A 1D tensor that describes each Molecule in the same order as defined in Chemistry.

  • ext_molecules (Tensor | None, default: None ) –

    Extracellular molecule concentrations. A 1D tensor that describes each Molecule in the same order as defined in Chemistry. These are the molecules in world.molecule_map of the pixel the cell is currently living on.

The simulation works with tensor representations of cell proteomes and molecule concentrations. This class is a helper which makes interpreting a cell easier. You should not instantiate this class but instead get it from get_cell().

All parameters are exposed as attributes on the cell object. Some are calculated lazily just when they are accessed.

cell = world.get_cell(by_idx=3)
assert cell.idx == 5  # the cell's current index
cell.proteome  # proteome is calculated now

When a cell divides its genome and proteome are copied. Both descendants will recieve half of all molecules each. Both their n_divisions attributes are incremented. The cell's label will be copied as well. This way you can track how cells spread.

magicsoup.factories

This module defines classes for generating other objects. Most importantly it defines the GenomeFact class which can be used to generate genomes that encode desired proteomes. Desired proteomes can be defines by describing domains using factory classes CatalyticDomainFact, TransporterDomainFact, RegulatoryDomainFact.

DomainFactType

Protocol for domain factories

CatalyticDomainFact(reaction, km=None, vmax=None)

Factory for generating nucleotide sequences with CatalyticDomains.

Parameters:
  • reaction (tuple[list[Molecule], list[Molecule]]) –

    Tuple of (substrates, products) from Chemistry where both substrates and products are lists of Molecules.

  • km (float | None, default: None ) –

    Desired Michaelis Menten constant of the transport (in mM).

  • vmax (float | None, default: None ) –

    Desired Maximum velocity of the transport (in mmol/s).

For stoichiometric coefficients > 1, list the molecule species multiple times. E.g. for 2A + B <-> C use reaction=([A, A, B], [C]) when A,B,C are molecule A, B, C instances.

km and vmax are target values. Due to the way how codons are sampled for specific floats, the actual values for km and vmax might differ. The closest available value to the given value will be used.

If any optional argument is left out (None) it will be sampled randomly.

validate(world)

Validate this domain factory's attributes

gen_coding_sequence(world)

Generate a nucleotide sequence for this domain

from_dict(dct) classmethod

Generate domain factory from dct representation of a domain. A dct representation of this domain is returned by CatalyticDomain.to_dict().

TransporterDomainFact(molecule, km=None, vmax=None, is_exporter=None)

Factory for generating nucleotide sequences with TransporterDomains.

Parameters:
  • molecule (Molecule) –

    Molecules which can be transported into or out of the cell by this domain.

  • km (float | None, default: None ) –

    Desired Michaelis Menten constant of the transport (in mM).

  • vmax (float | None, default: None ) –

    Desired Maximum velocity of the transport (in mmol/s).

  • is_exporter (bool | None, default: None ) –

    Whether the transporter is exporting this molecule species out of the cell.

is_exporter is only relevant in combination with other domains on the same protein. It defines in which transport direction this domain is energetically coupled with others.

km and vmax are target values. Due to the way how codons are sampled for specific floats, the actual values for km and vmax might differ. The closest available value to the given value will be used.

If any optional argument is left out (None) it will be sampled randomly.

validate(world)

Validate this domain factory's attributes

gen_coding_sequence(world)

Generate a nucleotide sequence for this domain

from_dict(dct) classmethod

Generate domain factory from dct representation of a domain. A dct representation of this domain is returned by TransporterDomain.to_dict().

RegulatoryDomainFact(effector, is_transmembrane, is_inhibiting=None, km=None, hill=None)

Factory for generating nucleotide sequences with RegulatoryDomains.

Parameters:
  • effector (Molecule) –

    Effector Molecules

  • is_transmembrane (bool) –

    Whether this is also a transmembrane domain. If true, the domain will react to extracellular molecules instead of intracellular ones.

  • km (float | None, default: None ) –

    Desired ligand concentration producing half occupation (in mM).

  • hill (int | None, default: None ) –

    Hill coefficient describing degree of cooperativity (currently 1, 3, 5 available)

  • is_inhibiting (bool | None, default: None ) –

    Whether the effector will have an activating or inhibiting effect.

km is a target value. Due to the way how codons are sampled for specific values, the final value for km might differ. The closest available value to the given value will be used.

If any optional argument is left out (None) it will be sampled randomly.

validate(world)

Validate this domain factory's attributes

gen_coding_sequence(world)

Generate a nucleotide sequence for this domain

from_dict(dct) classmethod

Generate domain factory from dct representation of a domain. A dct representation of this domain is returned by RegulatoryDomain.to_dict().

GenomeFact(world, proteome, target_size=None)

Factory for generating genomes that translate into a desired proteome.

Parameters:
  • world (World) –

    World object in which the genome will be used

  • proteome (list[list[DomainFactType]]) –

    Desired proteome as a list of lists of domain factories. Each list of domain factories represents a protein.

  • target_size (int | None, default: None ) –

    Optional target size of generated genome. Smallest possible genome size will be generated if None. If supplied, domains and proteins are padded with random sequences.

The desired proteome is given as a list of proteins where each protein is represented by a list of domain factories. As domain factories CatalyticDomainFact, TransporterDomainFact, RegulatoryDomainFact can be used. Use generate() to sample and generate a genome. There are always multiple base pair sequences which can encode the same proteome. Thus each generated genome might be different.

While the generated genome will always have the desired domains and proteins it might also include proteins which were not defined. This factory assembles a genome in one reading frame and direction. There is always the possibility that in another reading frame or on the reverse-complement of the genome other proteins are encoded. The larger the final genome, the higher the chances of this happening.

generate()

Generate a genome with the desired proteome

from_dicts(dcts, world) classmethod

Create a genome factory from a list of protein dct representations. A protein dct representation is returned by Protein.to_dict().

magicsoup.genetics

This module provides functions mainly for translation and transcription. Most of it is defined on the Genetics class. This class is used by World. Usually, a user does't need to use this class directly.

Genetics(start_codons=('TTG', 'GTG', 'ATG'), stop_codons=('TGA', 'TAG', 'TAA'), p_catal_dom=0.01, p_transp_dom=0.01, p_reg_dom=0.01, n_dom_type_codons=2)

Class holding logic about transcribing and translating nucleotide sequence pairs.

Parameters:
  • start_codons (tuple[str, ...], default: ('TTG', 'GTG', 'ATG') ) –

    Codons which start a coding sequence

  • stop_codons (tuple[str, ...], default: ('TGA', 'TAG', 'TAA') ) –

    Codons which stop a coding sequence

  • p_catal_dom (float, default: 0.01 ) –

    Chance of encountering a CatalyticDomain in a random nucleotide sequence.

  • p_transp_dom (float, default: 0.01 ) –

    Chance of encountering a TransporterDomain in a random nucleotide sequence.

  • p_reg_dom (float, default: 0.01 ) –

    Chance of encountering a RegulatoryDomain in a random nucleotide sequence.

  • n_dom_type_codons (int, default: 2 ) –

    Number of codons that encode the domain type: CatalyticDomain, TransporterDomain, or RegulatoryDomain.

During the simulation translate_genomes() is used. Its return value can be used to update cell parameters using Kinetics.set_cell_params() or to translate its abstract return value into a human readable Proteins description using Kinetics.get_proteome().

Translation happens only within coding sequences (CDSs). A CDS starts at every start codon and ends with the first in-frame encountered stop codon. Un-stopped CDSs are discarded. Both the forward and reverse complement of the nucleotide sequence are considered. Each CDS represents one Protein. All domains found within a CDS will be added as domains to that protein.

If you want to use your own genetics object for the simulation you can just assign it after creating World. E.g.:

world = World(chemistry=chemistry)
my_genetics = Genetics(p_transp_dom=0.1, stop_codons=("TGA",))
world.genetics = my_genetics

translate_genomes(genomes)

Translate all genomes into proteomes

Parameters:
  • genomes (list[str]) –

    list of base pair sequences

Returns:
  • list[list[ProteinSpecType]]

    List of proteomes. This is a list (proteomes) of list (proteins)

  • list[list[ProteinSpecType]]

    where each protein is a tuple (domains, cds_start, cds_end, is_fwd).

  • list[list[ProteinSpecType]]

    domains is a list of tuples (indices, start, end) with indices which

  • list[list[ProteinSpecType]]

    will be mapped to specific domain specifications by

  • list[list[ProteinSpecType]]

The result of this function can be used to update cell parameters using Kinetics.set_cell_params() or to translate this abstract return type into a human readable Proteins description using Kinetics.get_proteome().

Both forward and reverse-complement are considered. CDSs are extracted and a protein is translated for every CDS. Unviable proteins (no domains or only regulatory domains) are discarded.

CDS start and end describe the slice of the genome python string. I.e. the index starts with 0, start is included, end is excluded. E.g. cds_start=2, cds_end=31 starts with the 3rd and ends with the 31st base pair on the genome.

is_fwd describes whether the CDS is found on the forward (hypothetical 5'-3') or the reverse-complement (hypothetical 3'-5') side of the genome. cds_start and cds_end always describe the parsing direction / the direction of the hypothetical transcriptase. So, if you want to visualize a is_fwd=False CDS on the genome in 5'-3' direction you have to do n - cds_start and n - cds_stop if n is the genome length.

magicsoup.kinetics

This module provides functions mainly for reaction kinetics. Most of it is defined on the Kinetics class. This class is used by World. Usually, a user does't need to use this class directly.

Kinetics(chemistry, abs_temp=310.0, km_range=(0.01, 100.0), vmax_range=(0.001, 100.0), device='cpu', scalar_enc_size=64 - 3, vector_enc_size=4096 - 3 * 64)

Class holding logic for simulating protein work. Usually this class is instantiated automatically when initializing World. You can access it on world.kinetics.

Parameters:
  • chemistry (Chemistry) –

    Simulation Chemistry

  • abs_temp (float, default: 310.0 ) –

    Absolute temperature in Kelvin. Will influence reaction equilibriums. Higher temperatures give concentration gradients higher importance.

  • km_range (tuple[float, float], default: (0.01, 100.0) ) –

    The range from which to sample Michaelis Menten constants for domains (in mM). They are sampled from a uniform distribution with its reciprocal. All values must be > 0.

  • vmax_range (tuple[float, float], default: (0.001, 100.0) ) –

    The range from which to sample maximum velocities for domains (in mM/s). They are sampled from a lognormal distribution, so all values must be > 0.

  • device (str, default: 'cpu' ) –

    Device to use for tensors (see pytorch CUDA semantics). This has to be the same device that is used by World.

  • scalar_enc_size (int, default: 64 - 3 ) –

    Number of tokens that can be used to encode the scalars Vmax, Km, and sign. This should be the output of max(genetics.one_codon_map.values()) (Genetics).

  • vector_enc_size (int, default: 4096 - 3 * 64 ) –

    Number of tokens that can be used to encode the vectors for reactions and molecules. This should be the output of max(genetics.two_codon_map.values()) (Genetics).

There are c cells, p proteins, s signals. Signals are basically Molecules, but we have to differentiate between intra- and extracellular molecules. So, there are twice as many signals as Molecules. Molecules are always ordered as in Chemistry: first, the intracellular ones, then the extracellular ones. Cells are ordered as in World. Proteins are ordered how they were found by Genetics.translate_genomes() in each cell.

Attributes on an object of this class describe cell parameters:

  • Kmf, Kmb Affinities to all signals processed by each protein in each cell (c, p). Ratios of (f)orward and (b)ackward reaction affinities are their equilibrium constants.
  • Kmr Affinity of each regulating signal for each protein and each cell (c, p, s) exponentiated by their hill coefficients.
  • Vmax Maximum velocities of each protein in each cell (c, p).
  • Ke Equilibrium constants derived from standard reaction free energy for each protein in each cell (c, p).
  • N Stoichiometric number for each signal that is processed by each protein in each cell (c, p, s). Additionally, there are Nf and Nb which describe only forward and backward stoichiometric coefficients. This is needed in addition to N to properly describe reactions that involve co-factors (i.e. n=0).
  • A Allosteric modulation for each signal in each protein in each cell (c, p, s). The number represents the hill coefficient. Positive coefficients have an activating effect, negative have an inhibiting effect.

The main method is integrate_signals(). When calling World.enzymatic_activity(), a matrix X of signals (c, s) is prepared and then integrate_signals(X) is called. Updated signals are returned and World.enzymatic_activity() writes them back to world.cell_molecules and world.molecule_map.

Another method, which ended up here, is set_cell_params() which reads proteomes and updates cell parameters accordingly. This is called whenever the genomes of some cells have changed. Currently, this is also the main performance bottleneck.

When this class is initialized it generates the mappings from indices to domain parameters by random sampling. These mappings are then used throughout the simulation. If you initialize this class again, these mappings will be different. Initializing World will also create one Kinetics instance on world.kinetics. If you want to access indices to domain mappings in your simulation, you should use world.kinetics.

Note: All default values are based on the assumption that energies are in J, a time step represents 1s, and molecule numbers are in mM. If you change the defaults, you need to reconsider how these numbers should be interpreted.

The kinetics used here can theoretically never create negative molecule concentrations or make a reaction overshoot its rquilibrium state. However, this simulation computes these things one step at a time. This and the numerical limits of data types used here can create edge cases that need to be dealth with: reaction quotients overshooting the equilibrium state and negative concentrations. Both are caused by high with low values. With the default and ranges and stoichiometric coefficients below 100, the heuristics in integrate_signals() are close enough for a meaningful simulation.

If an enzyme is far away from its equilibrium state and substrate concentrations are far above it will progress its reaction at full speed . This rate can be so high that, within one step, surpasses . In the next step it will move at full speed into the opposite direction, overshooting again, and so on. Reactions with high stoichiometric coefficients are more prone to this as their rate functions are sharper. To combat this integrate_signals() works by iteratively approaching on multiple levels of the computation. The approach was tuned to compute fast and give satifisfying reults with certain conditions in mind. , , and concentrations generally not much higher than 100. Violating these assumptions could lead to reaction quotients constantly overshooting their equilibrium.

get_proteome(proteome)

Translate and return cell parameters for a single proteome

Parameters:
  • proteome (list[ProteinSpecType]) –

    proteome which should be translated and returned

Retruns

List Proteins that describe the cell's proteome.

set_cell_params(cell_idxs, proteomes)

Translate and set cell parameters for proteomes

Parameters:
  • cell_idxs (list[int]) –

    Indexes of cells which proteomes belong to

  • proteomes (list[list[ProteinSpecType]]) –

    List of proteomes which should translated and set

proteomes is a a list (proteomes) of lists (proteins) of domain specifications. It is derived from Genetics.translate_genomes(). The domain specification themself are tuples which are mapped to concrete values (molecule species, Km, Vmax, reactions, ...). cell_idxs refer to the World's cell indexes.

unset_cell_params(cell_idxs)

Set cell parameters for cells to zero.

Parameters:
  • cell_idxs (list[int]) –

    Indexes of cells

Indexes refer to the World's cell indexes.

copy_cell_params(from_idxs, to_idxs)

Copy paremeters from a list of cells to another list of cells

Parameters:
  • from_idxs (list[int]) –

    List of cell indexes to copy from

  • to_idxs (list[int]) –

    List of cell indexes to copy to

from_idxs and to_idxs must have the same length. Indexes refer to the World's cell indexes.

remove_cell_params(keep)

Remove cells from cell params

Parameters:
  • keep (Tensor) –

    Bool tensor (c,) which is true for every cell that should not be removed and false for every cell that should be removed.

keep must have length world.n_cells. Indexes of keep refer to the World's cell indexes.

increase_max_cells(by_n)

Increase the cell dimension of all cell parameters

Parameters:
  • by_n (int) –

    By how many rows to increase the cell dimension

increase_max_proteins(max_n)

Increase the protein dimension of all cell parameters

Parameters:
  • max_n (int) –

    The maximum number of rows required in the protein dimension

integrate_signals(X)

Simulate protein work by integrating all signals.

Parameters:
  • X (Tensor) –

    2D tensor of every signal in every cell (c,s). Must all be >= 0.0.

Returns:
  • Tensor

    New tensor of the same shape which represents the updated signals for every cell.

Dimension 0 indexes of X refer to the World's cell indexes. Dimension 1 indexes of X refer to Molecule indexes on Chemistry. The order of signals is first all intracellular molecule species in the same order as chemistry.molecules, then again all molecule species in the same order but this time describing extracellular molecule species. The number of intracellular molecules comes from world.cell_molecules for any particular cell. The number of extracellular molecules comes from world.molecule_map from the pixel the particular cell currently lives on.

magicsoup.mutations

This is a supporting module with some functions for efficiently creating mutations in nucleotide sequence strings.

point_mutations(seqs, p=1e-06, p_indel=0.4, p_del=0.66)

Add point mutations to a list of base pair sequences. Mutations are substitutions and indels.

Parameters:
  • seqs (list[str]) –

    base pair sequences

  • p (float, default: 1e-06 ) –

    probability of a mutation per base pair

  • p_indel (float, default: 0.4 ) –

    probability of any point mutation being an indel (inverse probability of it being a substitution)

  • p_del (float, default: 0.66 ) –

    probability of any indel being a deletion (inverse probability of it being an insertion)

Returns:
  • list[tuple[str, int]]

    List of mutated sequences and their indices from seqs.

The returned list only contains sequences which experienced at least one mutation. The new sequences are returned together with their seqs index. E.g. ("...", 5) means seqs[5] was mutated and the resulting sequence is in the tuple.

recombinations(seq_pairs, p=1e-07)

Add random recombinations to pairs of base pair sequences. The recombination happens by creating random strand breaks in the input sequence pairs and randomly re-joining them.

Parameters:
  • seq_pairs (list[tuple[str, str]]) –

    base pair sequence pairs

  • p (float, default: 1e-07 ) –

    probability of a strand break per base pair

Returns:
  • list[tuple[str, str, int]]

    List of mutated sequence pairs and their indices.

The returned list only contains sequence pairs that experienced a recombination. The new sequence paris are returned and in addition to that their seq_pairs index. E.g. if it's 5, it means seq_pairs[5] was mutated and the resulting sequences are in this 3-tuple.

magicsoup.util

Some more helper functions

round_down(d, to=3)

Round down to declared integer

closest_value(values, key)

Get closest value to key in values

randstr(n=12)

Generate random string of length n.

With n=12 and the string consisting of 62 different characters, there's a 50% chance of encountering one collision after 5e10 draws. (birthday paradox)

random_genome(s=500, excl=None)

Generate a random nucleotide sequence string

Parameters:
  • s (int, default: 500 ) –

    Length of genome in nucleotides or base pairs

  • excl (list[str] | None, default: None ) –

    Exclude certain sequences from the genome

Returns:
  • str

    Generated genome as string

If excl is given, all sequences in excl will be removed. However, these sequences might still appear in the reverse-complement of the resulting genome. If you also want to get rid of those, you have to also provide their reverse-complement in excl.

variants(seq)

Generate all possible nucleotide sequences from a template string.

Apart from nucleotides, the template string can include special characters: - N refers to any nucleotide - R refers to purines (A or G) - Y refers to pyrimidines (C or T)

codons(n, excl_codons=None)

Return all possible nucleotide sequences of n codons, optionally excluding codons in excl_codons.

dist_1d(a, b, m)

Distance between a and b on circular 1D line of size m

free_moores_nghbhd(x, y, positions, map_size)

For position x, y get positions in Moore's neighborhood on circular 2D map of size map_size which are not already occupied as indicated by positions