Department of Engineering and Chemical Sciences, Karlstad University, SE Karlstad, Sweden

Supporting Information Reliable Strategy for Analysis of Complex Biosensor Data Patrik Forssén a, Evgen Multia b, Jörgen Samuelsson a, Marie Andersson a, Teodor Aastrup c, Samuel Altun c, Daniel Wallinder c, Linus Wallbing c, Thanaporn Liangsupree b, Marja-Liisa Riekkola b *, Torgny Fornstedt a a Department of Engineering and Chemical Sciences, Karlstad University, SE-651 88 Karlstad, Sweden b Department of Chemistry, P.O. Box 55, FI-00014 University of Helsinki, Finland c Attana AB, Björnäsvägen 21, SE-114 19 Stockholm, Sweden CONTENTS Theory... S-2 The n-to-one Kinetic Model... S-2 Rate Constant Distributions... S-3 Blank injections... S-4 Synthetic Data... S-5 High Affinity Complex... S-7 Medium High Affinity Complex... S-9 Low Affinity Complex... S-11 References... S-12 S-1

Theory The n-to-one Kinetic Model Here we will assume that we have a biosensor system where the kinetics is described by an n-to-1 model, i.e., we have n interactions of the following type, a, i k A L A L, (1) i in the system, where [A i] is an injected analyte, [L] is a ligand immobilized on the biosensor chip surface, [A il] is a complex formed between the analyte and the ligand and this reaction has the corresponding association and disassociation rate constants, k a,i and k d,i, respectively. Let, kd, i i 0, t t0, ka, ic K 1 exp a, i d, i 0, 0 0 inj, i t k C k t t t t t t ka, ic k d, i Ks exp kd, i t t0 tinj, t t0 tinj, k C where K 1 exp k C k t. a, i s a, i d, i inj ka, ic k d, i (2) Here t is time, C is analyte concentration, t 0 is the time when the injection of the analyte begins and t inj is the injection duration. The total response R tot at time t will then be, R tot t n R 1 max, iki t, t t0, t t0 tinj, i n RI Rmax,, 1 iki t t0 t t0 tinj, i (3) where R max,i is the maximum analyte binding capacity and R I is an optional bulk effect parameter that is used to account for the fact that the biosensor base response might change during injection of analyte. For a n-interactions system the contribution c i of each interaction response to the total response, can, for example, be calculated as the mean of the contribution to the association and dissociation phase, c t0t t inj end R max, max, iki t R iki t dt t t 0 0tinj i 100 2, n t0tinj n tend j1 t max, j j j1 max, j j 0 t0tinj R K tdt R K t dt (4) where t end is the experiment duration. In a practical situation, we want to estimate the parameters k a, k d, R max, and possibly R I, from experimental sensorgram data R tot () t. This is usually done by fitting the model in eq 3, with some prechosen number of interactions, to the experimental data in a least squares sense, e.g., by using a least squares trust region reflective algorithm. 1 Conventionally one assumes one, sometimes two interactions, in the system and fit to all measured sensorgram data simultaneously, where it is assumed that k a, k d, R max are equal for all sensorgrams (but R I might be different). Assuming the n-to-one kinetic model in eq 3, a useful simple tool that indicates if one has one or more interactions in the biosensor system is by doing a dissociation graph. Here ln[r tot(t)/r tot(t 0 + t inj)] is plotted against t > t 0 + t inj. If this curve is near the top-left to bottom-right diagonal then there is only S-2

one interaction in the system, but if it is convex then there are at least two different interactions in the system. Rate Constant Distributions Given experimental sensorgram data R tot () t we want to estimate the rate constants k a, k d and maximum analyte binding capacities R max for the system. This can be done by using eq 3, n R t R K t, (5) tot i1 max, i i where we for clarity of presentation assume that R I = 0 (it is straightforward to include also these constants in eq 6). Assume that we have measured R tot at a m time points t j, eq 5 can be written as an m x n non-negativity constrained linear system, KR max R tot, where, K t K t R R t 1 1 n 1 max,1 tot 1 max tot K, R 0, R. K t K t R R t 1 m n m max, n tot m (6) Given K and Rtot we can solve eq 2 to get R max. However, as this is an ill posed problem, regularization should be applied. Here we will use Tikhonov regularization with the identity matrix I. To estimate k a, k d and R max we assume that the rate constants are in the domain 2 ka,min, ka,max kd,min, kd,max. One can discretize Ω using a fixed equidistant grid, 2 but here we will use new Finite Element based algorithm called Adaptive Interaction Distribution Algorithm (AIDA) 3 that gives finer discretization where we have large variations in R max. We begin by making an initial Delaunay triangulation of the whole domain and using the vertices of the triangulation in eq 6, we get an estimate of the corresponding R max. We then adaptively add new triangles to the triangulation and estimate a new corresponding R max. This is repeated iteratively until some maximum number of triangle vertices is reached and we call the estimated points R max(k a, k d) a discrete Rate Constant Distribution (RCD). In the RCD we get a number of discrete distributions and the mode of these distributions can be viewed as estimates of the biosensor systems rate constants, see for example TOC and Figure 1c. Calculation of an RCDs using AIDA, 3 is considerably faster than fitting to the experimental data and here we do not have to select the number of interactions in the system. However, RCDs should be used with caution as they are the solution of an ill-posed and ill-conditioned inverse problem and the solution depends heavily on the amount and type of regularization applied. One generally needs to use several sensorgrams when calculating an RCD in order to get a reliable result and a good advice is also to check the results by also doing model fitting. When using several sensorgrams, and proper regularization, one can get a good estimate of number of interactions and their corresponding rate constants from the peak maxima. These can, for example, be used as input to the fitting algorithm, but one should never try to draw any conclusions about the system from the peak shapes. In the proposed strategy we will calculate RCDs separately for each sensorgram, and the resulting RCDs can then only be used as guide to decide the number of interactions and their rate constants. S-3

Blank injections (b) (a) Figure S1: For the system in Figure 2 the corresponding blank injections, (a) the injection channel and (b) the reference channel. Figure S2: For the system in Figure 3, the (unused) blank injections. Figure S3: For the system in Figure 4, blank injections. S-4

Synthetic Data (a) (b) (c) (d) Figure S4: (a) Sensorgrams for a perfect synthetic system at different analyte concentration levels. (b) Dissociation graph for a 220 nm injection. (c) Rate Constants Distribution (RCD) for a 24 nm injection. (d) Rate constants obtained by fitting a two interactions model to the sensorgrams one by one. In part d, the circled areas are proportional to the relative contributions, the crosses indicate median rate constants and the stars are the estimated rate constants from global fitting to a two interactions model (here they overlap). S-5

(a) (b) (c) (d) Figure S5: (a) Simulated sensorgrams (solid curves) at some analyte concentration levels for the deteriorating synthetic system in Figure 1 together with globally fitted sensorgrams using a two interactions model (dotted curves), (b) the residuals for the fits in (a). Figure (c) and (d) are the same as (a) and (b) when using fitting one by one. The vertical lines indicate the injection duration. Table S1: For the synthetic systems, seen in Figure S1 (perfect) and Figure 1 (deteriorating), median rate constants and dissociation equilibrium constants with 95% confidence intervals were estimated using global or local (one by one) fitting to a two interactions model. Together with the corresponding mean contribution (c) and Root Mean Square Error Normalized (RMSEN). System Fit True Global Perfect Local Global Deteriorating Local Interaction #1 #2 #1 #2 #1 #2 #1 #2 #1 #2 log10(ka) [(Ms)-1] 6.00 4.50 6.00 ± 1.8 10-7 4.50 ± 5.3 10-6 6.00 [6.00, 6.00] 4.50 [4.50, 4.50] 6.14 ± 0.006 5.17 ± 0.06 6.00 [6.00, 6.00] 4.50 [4.50, 4.50] S-6 log10(kd) [s-1] log10(kd) [M] -2.50-1.50-2.50 ± 2.1 10-7 -1.50 ± 6.9 10-7 -2.50 [-2.50, -2.50] -1.50 [-1.50, -1.50] -2.50 ± 0.007-1.66 ± 0.03-2.50 [-2.50, -2.50] -1.50 [-1.50, -1.50] -8.50-6.00-8.50 ± 2.0 10-7 -6.00 ± 3.8 10-6 -8.50 [-8.50, -8.50] -6.00 [-6.00, -6.00] -8.64 ± 0.007-6.83 ± 0.05-8.50 [-8.50, -8.50] -6.00 [-6.00, -6.00] c 86.4 13.6 86.4 13.6 85.3 14.7 89.2 10.8 86.2 8.8 RMSEN 0.00025 0.34 8.5 0.32

High Affinity Complex (a) (b) (c) (d) (e) (f) Figure S6: For the trastuzumab-her2 system in Figure 2, fits (dotted curves) to measured sensorgrams (solid curves) in (a, c, e) at some analyte concentration levels, with the corresponding residual plot in (b, d, f). (a) Global fitting to one interaction model, (b) global fitting to a two interactions model, and (c) local (one by one) fitting to a two interactions model. The vertical lines indicate the injection duration, notice that the measured sensorgrams are here adjusted by the estimated start of the injection t0. S-7

Table S2: For the trastuzumab-her2 system, seen in Figure 2, median rate constants and dissociation equilibrium constants with 95% confidence intervals were estimated using global fitting to one- and twointeractions model or local (one by one) fitting to a two interactions model. Together with the corresponding mean contribution (c) and Root Mean Square Error Normalized (RMSEN). Fit Interactions Interaction log 10(k a) [(Ms) -1 ] log 10(k d) [s -1 ] log 10(K D) [M] c RMSEN 1 #1 5.75 ± 0.001-3.43 ± 0.01-9.18 ± 0.008 100.0 4.1 Global #1 5.82 ± 0.003-21.2 ± 2.3 10 2-27.1 ± 1.6 10 6 87.7 #2 5.07 ± 0.03-2.49 ± 0.02-7.56 ± 0.03 12.3 2.7 Local 2 #1 5.85 [5.64, 5.92] -3.62 [-3.67, -3.56] -9.42 [-9.55, -9.35] 97.2 #2 5.19 [5.14, 5.78] -1.53 [-1.82, -1.41] -7.28 [-7.89, -7.22] 3.1 1.1 S-8

Medium High Affinity Complex (a) (b) (c) (d) (e) (f) Figure S7: For the IDL-VLDL-anti-apoB-100 system in Figure 3, fits (dotted curves) to measured sensorgrams (solid curves) in (a, c, e) at some analyte concentration levels, with the corresponding residual plot in (b, d, f). (a) Global fitting to one interaction model, (b) global fitting to a two interactions model, and (c) local (one by one) fitting to a two interactions model. The vertical lines indicate the injection duration, notice that the measured sensorgrams are here adjusted by the estimated start of the injection t0. S-9

Table S3: For the IDL-VLDL-anti-apoB-100 system, seen in Figure 3, median rate constants and dissociation equilibrium constants with 95% confidence intervals were estimated using global fitting to one- and twointeractions model or local (one by one) fitting to two interactions model. Together with the corresponding mean contribution (c) and Root Mean Square Error Normalized (RMSEN). Fit Interactions Interaction log 10(k a) [(Ms) -1 ] log 10(k d) [s -1 ] log 10(K D) [M] c RMSEN 1 #1 5.62 ± 0.002-3.02 ± 0.005-8.64 ± 0.004 100.0 6.1 Global #1 5.62 ± 0.002-3.07 ± 0.006-8.69 ± 0.005 92.8 2 #2 6.09 ± 0.05-1.09 ± 0.054-7.18 ± 0.05 7.2 5.5 Local 2 #1 5.75 [5.74, 5.82] -3.07 [-3.09, -3.07] -8.88 [-8.91, -8.83] 95.7 #2 3.57 [3.09, 3.66] -1.44 [-1.50, -1.36] -5.11 [-5.94, -4.97] 4.3 1.2 S-10

Low Affinity Complex (a) (b) (c) (d) (e) (f) Figure S8: For the PTH-PTH1R system in Figure 4, fits (dotted curves) to measured sensorgrams (solid curves) in (a, c, e) at some analyte concentration levels, with the corresponding residual plot in (b, d, f). (a) Global fitting to one interaction model, (b) global fitting to a two interactions model, and (c) local (one by one) fitting to a two interactions model. The vertical lines indicate the injection duration, notice that the measured sensorgrams are here adjusted by the estimated start of the injection t0. S-11

Table S4: For a PTH-PTH1R system, seen in Figure 4, median rate constants and dissociation equilibrium constants with 95% confidence intervals were estimated using global fitting to one- and twointeractions model or local (one by one) fitting to a two interactions model. Together with the corresponding mean contribution (c) and Root Mean Square Error Normalized (RMSEN). Fit Interactions Interaction log 10(k a) [(Ms) -1 ] log 10(k d) [s -1 ] log 10(K D) [M] c RMSEN 1 #1 4.13 ± 0.009-1.15 ± 0.005-5.28 ± 0.007 100.0 14.1 Global #1 4.32 ± 0.007-0.91 ± 0.005-5.23 ± 0.006 58.7 2 #2 3.40 ± 0.01-2.26 ± 0.02-5.66 ± 0.01 41.3 7.3 Local 2 #1 4.20 [4.05, 4.43] -0.92 [-0.93, -0.91] -5.14 [-5.36, -5.04] 58.1 #2 3.66 [3.38, 4.11] -2.34 [-2.39, -2.04] -5.90 [-6.44, -5.54] 43.5 5.5 References (1) Coleman, T. F.; Li, Y. SIAM J. Optim. 1996, 6 (2), 418 445. (2) Svitel, J.; Balbo, A.; Mariuzza, R. A.; Gonzales, N. R.; Schuck, P. Biophys. J. 2003, 84 (6), 4062 4077. (3) Zhang, Y.; Forssén, P.; Fornstedt, T.; Gulliksson, M.; Dai, X. Inverse Probl. Sci. Eng. 2017, 1 26. S-12