# sequence assembly

Imagine going for a stroll, through a forested glade, without a care in the world. With a camera slung over your shoulder and a handful of breadcrumbs, you leave your worries behind. You discover new trees, animals, and a curious set of bridges. You are lost in wonder and majesty, holding only fragments of the experience in your mind.

\
Strangely, you find that the files on your camera are duplicated, shuffled, and erroneously renamed. Here is what you see:

<shuffle.jpg> <histo.jpg> <renamed.jpg>

- All possible orderings of images grows large.

<histo_with_edges.jpg>

- A 4-digit lock has possible (4-digit) combinations.

Each combination has 4-digits, thereby requiring 40,000 key presses! However, if the 4 keys are accepted continuosly, the total size of the sequence is the 10,000 digits + 3(a wrap-around from the first three), giving 10,003.

<stream.gif> <debruijn.png>

- An Eulers walk over a graph is a path over all nodes, where each edge(bridge) is traversed only once. In the keypad example, the edges provide a constraint, of length size 4; the total number of combinations of the sets of numbers up to length 4 would be 10^1 + 10^2 + 10^3 +10^4. All subsequences of b(10,3) are contained in the debruijn sequence of b(10,4).

In term of dna sequencing, longer kmers are more specific leading to lower coverage, while lower kmer length give higher coverage but more error. In fact, setting the graph constraints at a higher kmer(breaking the edges) and then assembling at a lower kmer provides better accuracy. <files/doodle.png>

- These graphs chiefly contain even ordered nodes, are bipartite.

## dx

- How can this problem be formulated in terms of calculus?

An indefinite integral with respect to the assembled sequence as the antiderivative which represents all possible debruijn paths and the sequence partially ordered by graph edges. The debruijn graph provides the position or each contig and this represents the assembly.

## kmer

The debruijn sequence, transitions between the states of each k-word. This k-word value can be encoded as a position, for example a 3-bit number(000=1,001=2,011=,…,111=8).

In the context of bioinformatics, rather than searching through a debruijn sequence for the correct assembly, we reverse the problem, using the encoded position within the debruijn sequence to assemble the fragmented parts. First, the k-mers are made into edges, with nodes as prefix/suffix. The hamiltonian path(inefficient) or euler cycle(efficient) can be thought of as the incrementor, the nodes it connects are the debruin sequence, and a representation of an assembled, mapped(ordered) genome.

In the case of a 4-digit key code, the positions are known, we seek the code value. If a fragment is linked to two unconnected bridges (splice junction), the debruijn sequence provides us with the position in the sequence that fragment can be found, ie a mapping/assembly. Card tricks over debruijn sequences work because the ordering is maintained even if the deck is cut.

## vinidium
- map with mines with edges(distance) between
- map_init=nxn,dbj_init=[[0]xn]
- encounter player/mine splice_junction
- create edge on map between current dbj_state and map(ie splice junction)
- determine new dbj, follow new ordering
- `idea`

is to explore known_map as Euler’s walk (traverse each edge at most once)
- mines repr as nodes
- other players repr as edges
- max mines/min opponents

## card trick

`Modulus incrementer:`

Let’s use a card trick to better understand what is going on. Briefly,- for 2^6 bits there are 64 values
- 4bits => value, 2bits => suite

- increment first/last cards to logically generate color(red/black) of next++
- the next++ card’s value is the 4bit encoding
- this value is the subsequent value in a debrujin sequence.

- for 2^6 bits there are 64 values

```
def deBruijn(k,n):
'''
input: k equals size of alphabet
n equals length of subsequence
output: de bruijn sequence, no repeating subsequence, permutes all combinations
'''
# permute combos, each iteration, gets encoded with bit-value-position
# bit size k^n
a = [0] * k * n
```

## palindromes and game boards

- palindromes:
Runtime. what is runtime for palindrome subsequence. what is runtime for all perms w/o repeat sub-seq?
- generate all: o(n)
- prune: o(n^2)

- Vindinium AI is a competition to best traverse a 2d landscape for mines(gold), taverns(hit points), and opponents(attack).
- Given a random landscape (fully mapped), strategy is based on Euler cycle, to limit exposure to attack.
- for each round shortest edges between each player, neutral mine, and tavern is constructed.
- being attacked is avoided by seeking minimal edge traversal
- for each round, the euler cycle is recalculated as an encoded debruijn sequence

- upsilon toroids refactor as debruijn

## refs - Thinking about sequence alignment:

```
- string reconstruction
- overlap sequence problem;
- hamiltonian path (each node)
- euler/debrujian
- kmer as edge, merge nodes,
- dbj encode position, without state
- de bruijn
- shotgun sequence
- chess board (vinidium)
- loaded deck of cards
- meta-analysis paper nature 50yrs twin study polderman et. al
- coursera pavel ucsd
- 4 russians
- speed up 2log
- vinidium
- DBJ-toroid
- row/col wrap around
- B(k,window_x, window_y)
- vinidium
- moves as reads
- encounter player, split the edge into smaller k-mer
- what does genome represent? how does it evolve? what info encoded? what determined? how behavior encoded?
```