Last time I talked about the issue with the method used by Daaaaave [1] to map from enzyme-level data to reaction-level data. Given enzyme (or gene) levels A = 4, B = 3 and C = 2 units, we find:

**reaction** |
**GPR** |
**Daaaaave** |
**Gimme** |

4 |
(A and B) or (A and C) |
min(4,3) + min(4,2) = **5** |
max(min(4,3), min(4,2)) = **3** |

5 |
A and (B or C) |
min(4,3 + 2) = **4** |
min(4, max(3,2)) = **3** |

.

The problem with applying the min/plus rule to GPRs is that reactions 4 and 5 are the same (albeit differently bracketed), but Daaaaave assigns them different values. As Nikos pointed out, the min/max rule used by Gimme [2] doesn’t make this mistake. However, I think we really should be adding the activities of alternative catalysts; indeed some networks — such as “Yeast 1” [3] — use separate reactions in place of “or” statements. Any mapping must be robust to equivalent representations.

Let’s step back a bit. Reaction 4 is catalysed by alternative complexes, A:B and A:C.

r_{4} → (A and B) or (A and C)

There is less of A (4) than the total amount of B (3) and C (2), so there must be some B or C “wasted” when forming the two complexes. There are an infinite number of arrangements here — we could have A:B/A:C = 3/1, 2/2, 2½/1½, … — but their maximum total activity is 4 units. This value of 4 is overestimated by Daaaaave, but underestimated by Gimme.

We can frame our verbal reasoning above mathematically. Each GPR mention of an enzyme across the network is really a separate entity

r_{4} → (A_{1} and B_{1}) or (A_{2} and C_{1})

that together make up the total enzyme level

A_{1} + A_{2} + … = A = 4.

We can substitute “and” relationships by introducing new variables X_{i} ≥ 0 that represent complexes

r_{4} → X_{1} or X_{2}

whose activities can be no more than any of their parts

X_{1} ≤ A_{1}, X_{1} ≤ B_{1}.

We can also substitute “or” relationships by introducing new variables Y_{i} that represent alternative catalysts

r_{4} = Y_{1}

whose activities are the sum of their parts

Y_{1} = X_{1} + X_{2}.

Finally, we want there to be as little wastage as possible, and one way to achieve this is through maximising the total activity

maximise: r_{1} + r_{2} + ….

This optimisation is an LP problem and can be easily solved for networks of any size. Indeed, running an FBA over the network would be of the same computational complexity. Most importantly, this mapping makes the most of the available data.

### References

- Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.

doi:10.1186/1752-0509-6-73
- Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.

doi:10.1371/journal.pcbi.1000082
- Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB (2008) “A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology” Nat Biotechnol 26:1155-1160.

doi:10.1038/nbt1492