*Note: this is part 2 of the Science Integrity series *

At the AAIC conference in July 2021, Cassava Sciences presented a poster** **showcasing encouraging results from SavaDx, their novel plasma diagnostic/biomarker to detect Alzheimer’s disease. SavaDx measures changes in levels of a protein named altered filamin A (FLNA) which is linked to AD disease mechanism. The scaffolding of FLNA is also critically related to the amyloid and tau pathologies in AD. In a Phase 2 trial period spanning 28 days, FLNA levels significantly reduced in the two treatment groups of Cassava’s Simufilam group when compared to placebo group. The poster also validated SavaDx results by showing that the treated patients in the same clinical study exhibited significant reductions in plasma P-tau181, a key biomarker known to high correlation with cognitive measures and disease progression.

Since the conference presentation, the SavaDx poster has come under heavy scrutiny for displaying two mutually incongruent plasma P-tau181 data charts – namely spaghetti (bipartite graph) and scatter plots. Some absent data points and a misplaced outlier were highlighted (amongst other complaints) in a citizen petition to the FDA asking for Phase 3 trials of Simufilam to be halted. A supplement to the petition rebukes the p-value claims in the poster with alternate, speculative scenarios of the outlier placement. It has also been a subject of hot debate in social media, with Dr. Elisabeth Bik leading the criticism in the Twitterverse as well as her blog. On Sept 3, Cassava Sciences released a public statement with corrected plasma P-tau181 charts.

In the rest of this article, we will take a deep dive into the extracted raw data and address misconceptions perpetrated on social media platforms (Twitter, Reddit, etc) by relying on independent analysis from two data scientists. **We summarize that the underlying plasma Ptau-181 data, incorporating the minor corrections, indicates no deliberate manipulation and, in fact, reinforces the optimism about good cognitive endpoint outcomes expected from upcoming Phase 3 trials.**

**Validity of Our Extracted Data**

Two contributing data scientists of this blog independently extracted raw data values of plasma P-tau181 from the spaghetti and scatter plots (Figures 5 and 4 respectively in the SavaDx poster). The subsequent corrections made in Cassava Sciences’ public statement, i.e insertion of the two missing lines in the spaghetti plot and the outlier removed from all analysis, were incorporated. Note that the corrected 100mg spaghetti plot has 17 data points after the outlier is removed.

Data scientist 1 (DS1) digitally placed fine grained horizontal scales on the image to obtain the relative pixel heights of the data points. Data scientist 2 (DS2) used a web tool to precisely annotate the dots/lines in the poster and obtain the pixel 2D positions.

From figures 1 and 2, it should be immediately clear that the data scientists’ values (extracted independently) closely track each other for the 100mg group. It extends to placebo and 50mg as well. In the Appendix, we publish all our extracted raw values for placebo, 50mg, 100mg for transparency, along with Dr. Bik’s raw data she shared as a screenshot in a tweet. Overall, it establishes high confidence that the raw data extracted by our data scientists represents the ground truth, a core assumption we make. We use the average of the two raw values for each data point for the rest of our analysis.

Oddly enough, Dr. Bik’s raw data (marked EB in figures 1, 2) has only 15 values (doesn’t match with corresponding scatter plot either) for 100mg and deviates more from the ground truth. The divergence is understandable as the online extraction tool used by her may not have spotted all the overlapping lines in the bottom half of the spaghetti plot. However, it does raise some questions about her due diligence in applying the proverbial fine toothed comb here, a quality she is admired for in her scientific critiques shared publicly on Twitter.

**Unfounded Manipulation Concerns**

In figures 3, 4, and 5 below, we co-plot the sorted % improvements (aka CFB – Comparison from Baseline) obtained from two distinct chart sources in the SavaDx poster. For the scatter plot, we derive them from the pairwise differences between raw Day 1 and 28 plasma P-tau values. The spaghetti chart contains the CFB values directly.

In each of the figures, notice how the lines of sorted CFS values coincide well visually. The ideal invariant property is that the lines must exactly coincide. However, they are expected to diverge slightly due to the inherent inaccuracies involved in our raw extraction process. The visually seen coincidence is supported by low RMSE (root mean square error) and high correlation coefficient between the two lines in each of the graphs. Additionally, an unpaired t-test with null hypothesis that they are from same distribution was run. High p-values (summarized in table 1) failed to reject the null hypothesis though it doesn’t prove the opposite.

Paired t-test results | p-value | t-value | Degrees of freedom (N-1) |

Placebo | 0.7518 | 0.3093 | 36 |

50mg | 0.9116 | 0.1120 | 28 |

100mg | 0.8691 | 0.1662 | 32 |

*Table 1*: High p-values fail to reject the null hypothesis that the CFB curves are from same distribution

We assert that this observation eliminates doubts raised about whether the Cassava team modified underlying data for either of the scatter and spaghetti plots. Such tampering would be hard to go unnoticed as ensuring that each plot suits a certain narrative while also managing to achieve the above mentioned invariance is extremely difficult.

**Evidence of Significant Treatment Effects**

Table 2 highlights the statistical significance of the differences in mean CFB between 50mg vs placebo and 100mg v placebo in the corrected data where the single outlier is removed from all analysis. Statistical significance was measured by unpaired t-test. Note any p-value less than a threshold 0.05 is considered evidence of high statistical significance. Though the mean CFB of 100mg (-16%) isn’t very different from that of 50mg (-15%), when compared to placebo (mean CFB +14%), the 100mg positive treatment effect entails much higher confidence (2.6x lower p-value) compared to 50mg treatment though both are significant.

Unpaired t-test results relative to Placebo | p-value | t-value | Degrees of freedom (N1 + N2 – 2) |

50mg | 0.0342 | 2.2120 | 32 |

100mg | 0.0130 | 2.6205 | 34 |

*Table 2: Statistical significance of CFB mean differences between treatment groups and placebo*as measured by unpaired t-test.

It is interesting to observe in Figure 6 below that even though placebo’s mean CFB (i.e CFB computed for each patient across 28 days and averaged over) is +14%, the means of the absolute plasma P-tau values at Day 1 and Day 28 (across patients) are similar to each other. We infer this as more indicative of the placebo group not really shifting in its baseline plasma P-tau values. On the other hand, the means of the absolute plasma P-tau values at Day 1 and Day 28 for 50mg and 100mg groups have clear drops indicating real treatment effects. Figure 7 shows a similar pattern for standard deviation. It further indicates positive treatment effects as the 50mg and 100mg groups are exhibiting faster convergence/regression to lower plasma P-tau values. We also include the histogram of the Day 1 and Day 28 plasma P-tau values in figures 8, 9.

**Missing Data Points**

There has been confusion regarding some missing data points in SavaDx poster’s plasma P-tau charts. The trial study included 64 patients but only 52 plasma P-tau points were plotted across the groups, accounting for 12 absentees. The following excerpt in the SavaDx posted explaining the exclusion criteria seems to have eluded the critics who insinuated that only favourable data points were cherry picked for presentation.

*Plasma P-tau181 was measured in duplicate by SIMOA®, a digital ELISA platform. Data with CVs >11% were repeated and excluded if >15% on repeat. *

Note that Cassava Sciences acknowledged the error with respect to a couple of missing data points and added them to the Placebo group in the corrected spaghetti plot.

**Conclusion**

We conclude that the controversy surrounding the plasma P-tau181 data is much ado about nothing. There are no indications of deliberate manipulation. We thank all the critics for prompting independent efforts from our data scientists to not only validate the data but bolster the argument that we can expect positive cognitive endpoint outcomes from upcoming Phase 3 trials.

**Appendix**

Averaged plasma P-tau raw values extracted by our data scientists

Data point index | placebo D1 | placebo D28 | 50mg D1 | 50mg D28 | 100mg D1 | 100mg D28 |

1 | 13.44 | 8.44 | 5.72 | 2.57 | 10.98 | 4.28 |

2 | 9.53 | 12.19 | 14.92 | 7.28 | 6.70 | 3.29 |

3 | 7.81 | 5.47 | 1.46 | 0.78 | 1.97 | 1.20 |

4 | 6.72 | 7.19 | 7.64 | 4.53 | 6.25 | 3.86 |

5 | 5.31 | 1.25 | 7.12 | 4.43 | 6.54 | 4.06 |

6 | 5.00 | 3.44 | 10.09 | 6.80 | 4.85 | 3.33 |

7 | 3.81 | 6.25 | 4.13 | 3.01 | 5.72 | 4.06 |

8 | 3.81 | 4.53 | 3.67 | 2.87 | 2.42 | 1.73 |

9 | 3.28 | 4.69 | 6.98 | 5.72 | 4.08 | 3.33 |

10 | 3.28 | 3.28 | 3.11 | 2.57 | 2.82 | 2.48 |

11 | 2.97 | 3.59 | 2.63 | 2.57 | 4.36 | 3.96 |

12 | 2.97 | 3.59 | 2.75 | 3.23 | 3.57 | 3.25 |

13 | 2.81 | 2.50 | 4.23 | 5.13 | 3.69 | 3.98 |

14 | 2.19 | 4.06 | 2.11 | 2.75 | 7.57 | 8.28 |

15 | 2.11 | 3.28 | 1.86 | 3.23 | 1.58 | 1.89 |

16 | 2.11 | 2.11 | 1.44 | 1.89 | ||

17 | 2.03 | 2.03 | 2.54 | 3.53 | ||

18 | 1.88 | 2.03 | ||||

19 | 1.66 | 2.03 |

Screenshots of Dr. Bik’s tweet about her raw data values and the embedded image.

Screenshot of digitally placed horizontal lines in usage by Data Scientist 1 to extract raw data.

Screenshot of webtool (https://apps.automeris.io/wpd/) in usage by Data Scientist 2 to extract raw data.

This report is professional not emotional. Clear in its efforts to address the current claims brought forth by individuals and entities claiming a bias to effectuate a desired result I find this closer to the truth due to its independence of the authors. If I did not own this stock I would after reading it and seeing how dramatically it has declined based on the commentary/opinion and actions that have recently taken place.

LikeLiked by 4 people

What is most impressive to me about this data is that both 50 and 100 mg of Simufilam clearly have a significant effect on pTau – the biomarker which tends to correlate most with cognitive decline. These samples were measured by Quanterix and clearly the outlier removal, (using standard statistical techniques), in the 100 mg group is justified – how could 50 mg have a statistically significant effect and not 100 mg? This data directly refutes several points currently contributing to the FUD surrounding Cassava Sciences; not only that all biomarker data was fabricated by Dr. Wang (again, this data was measured by Quanterix) but also the supposition that high affinity binding to Filamin is not possible and the mechanism of action elucidated by Dr. Wang and Dr. Burns has been made-up. Clearly the 50 mg dose is having a major effect on this critical downstream biomarker – what other reasonable theory can explain this?

LikeLiked by 1 person

The above findings and analysis are well put and clear. Impressive work and research! Thank you so much for your effort and sharing the truth. Cassava Sciences will prevail and Simufilam will be a breakthrough drug to treat Alzheimer.

LikeLiked by 1 person

This is a very impressive and detailed analysis of the accusations against Cassava Sciences regarding some questions coming up on a visual. I like the professionalism and transparancy of this analysis, so everybody can follow and repeat the exercise if whished. It’s also good to see that the read out of the plots were made on two different ways.

I think this is a very good example of how a scientific discourse should work: transparent and from a neutral position

As a retail investor I’m looking forward to see more of such kind of analysis.

LikeLiked by 1 person

Who is this team of internal doctors and scientists?

LikeLike

If figure 8 and 9 use the same X-axis origin and scale, it would be much easier to read.

LikeLike

Interesting post.

Regarding Table 1: Isn’t the null hypothesis for a t-test that the means of the two populations are the same and a low p-value from a t-test show that the means of the two data sets are different? I.e. two compared data sets (spaghetti vs scatter) are two different populations?

Regarding table 2: Isn’t it more correct to use a multiple comparison test to compare the placebo, 50mg and 100mg to include all the variation observed in the study? And the test should be a paired test?

LikeLike

Whichever way you look at it, the clinical data so far is pretty good. The only argument against it of course is that the sample size is not large enough to make a true statistical estimate. But then for that is, they have a Phase 3. Instead of making that as a critique, the shorts wants Phase 3 to be stopped citing fudged arguments. There is absolutely no safety problem known so far with Simufilam and why would someone request a Phase 3 halt then? Clear agenda!

Anyway, here is what you are looking for (paired Placebo and 100mg): Also see here https://twitter.com/MITGrad3/status/1435770272016048133

Two sample t-test (equal variance)

———————————-

Population details:

parameter of interest: Mean difference

value under h_0: 0

point estimate: -37.4885

95% confidence interval: (-65.748, -9.2291)

Test summary:

outcome with 95% confidence: reject h_0

two-sided p-value: 0.0108

Details:

number of observations: [17,20]

t-statistic: -2.6931082710449474

degrees of freedom: 35

empirical standard error: 13.920170146432913

LikeLike

Amazing work !

LikeLike

A truly highly detailed comprehensive analyses and a complete rebuke of the false allegations of fraud or data manipulation. The level of analysis is consistent with the rigor that any NIH or FDA scientist would also take in assessing the raw data. Any of the data supporting efficacy (including biomarker data) would undoubtedly be submitted to the FDA prior to Phase 3 initiation. There is absolutely no surprise that the FDA green-lighted the Cassava SPA agreement. Well done and a showcase on how proper data evaluation is conducted….ie. Not on Twitter!

LikeLike

Well written piece. I think this officially debubks the “fraud” & “manipulation” thesis. I also think it is your civic duty to send a letter to the FDA with this analysis. The last ammendmend to the citizen petition actually includes these misconceptions.

Thank you!

LikeLike

Well written piece. I think this officially debunks the “fraud” & “manipulation” thesis. I also think it is your civic duty to send a letter to the FDA with this analysis. The last ammendmend to the citizen petition actually includes these misconceptions.

Thank you!

LikeLike

Thanks for the analysis. I have shared this on twitter, FYI in reply to Dr. Bik. The data and analysis pretty much concur with my analysis too. https://twitter.com/MITGrad3/status/1435770272016048133

LikeLike

Nice comprehensive work. I wish it held the same power to move the stock price as the initial citizens petition. However, once doubts are introduced they do not dispel readily but unbiased pieces like this help.

LikeLike

I wonder what t test you used in your analyses. I grabbed your data and did t test myself. I did it for both change of biomarker from day 0 to day 28 in absolute term and in % term relative to baseline. For absolute change, there were non-significance difference relative to placebo in either 5mg and 10mg group, if you look at the 2 side test. 2-side test is required hypothesis testing by FDA in clinical trials!! Even if you look at one side test, the 10mg group is still not significantly better than placebo. By looking at the % change, data turned better but 5mg group is different from placebo in two-side test.

Few things to notice: 1) % should not be treated as primary end-point as change from 0.2 to 0.1 is vastly different from change from 200 to 100. In clinical trials, the absolute change of biomarker should be treated primary end-point, in which SAVA failed completely. SAVA’s use of % change as comparison criterion is very deceptive and 2) the data you show revealed much less rosy picture than SAVA claimed them to be (<0.01 p values in both groups) in original poster. 3) the 150% data point in 10 mg group was dropped. More explanation needs to be given for such drop of bad data from treatment group. Adding this data point back to 10mg group, the 10mg group would have been no differ from placebo.

//////////////////////////

absolute change of biomarker

5 mg vs placebo

. ttest v_dif, by(g2) unp, if time==2 & g3!=1

Two-sample t test with equal variances

——————————————————————————

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

———+——————————————————————–

0 | 19 -.0405263 .4556606 1.986179 -.9978337 .9167811

1 | 15 -1.396667 .5949419 2.3042 -2.67269 -.1206432

———+——————————————————————–

combined | 34 -.6388235 .3785175 2.207118 -1.408923 .1312762

———+——————————————————————–

diff | 1.35614 .7360938 -.1432336 2.855514

——————————————————————————

diff = mean(0) – mean(1) t = 1.8423

Ho: diff = 0 degrees of freedom = 32

Ha: diff 0

Pr(T |t|) = 0.0747 Pr(T > t) = 0.0374

10mg vs placebo

. ttest v_dif, by(g3) unp, if time==2 & g2!=1

Two-sample t test with equal variances

——————————————————————————

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

———+——————————————————————–

0 | 19 -.0405263 .4556606 1.986179 -.9978337 .9167811

1 | 17 -1.098824 .45696 1.884094 -2.067535 -.1301117

———+——————————————————————–

combined | 36 -.5402778 .3307688 1.984613 -1.211774 .1312186

———+——————————————————————–

diff | 1.058297 .6472692 -.257112 2.373706

——————————————————————————

diff = mean(0) – mean(1) t = 1.6350

Ho: diff = 0 degrees of freedom = 34

Ha: diff 0

Pr(T |t|) = 0.1113 Pr(T > t) = 0.0556

///////////////////////////////////////////////////////////////////

% change of biomarker

5 mg vs placebo

. ttest v_difpct, by(g2) unp, if time==2 & g3!=1

Two-sample t test with equal variances

——————————————————————————

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

———+——————————————————————–

0 | 19 9.886048 8.769015 38.22325 -8.536968 28.30907

1 | 15 -13.85603 9.267852 35.89424 -33.73359 6.021535

———+——————————————————————–

combined | 34 -.5883978 6.612584 38.55766 -14.0418 12.865

———+——————————————————————–

diff | 23.74208 12.85641 -2.445562 49.92972

——————————————————————————

diff = mean(0) – mean(1) t = 1.8467

Ho: diff = 0 degrees of freedom = 32

Ha: diff 0

Pr(T |t|) = 0.0741 Pr(T > t) = 0.0370

10 mg vs placebo

. ttest v_difpct, by(g3) unp, if time==2 & g2!=1

Two-sample t test with equal variances

——————————————————————————

Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

———+——————————————————————–

0 | 19 9.886048 8.769015 38.22325 -8.536968 28.30907

1 | 17 -15.14871 6.954728 28.67508 -29.89207 -.4053461

———+——————————————————————–

combined | 36 -1.935921 5.981301 35.8878 -14.07861 10.20676

———+——————————————————————–

diff | 25.03476 11.37257 1.922917 48.1466

——————————————————————————

diff = mean(0) – mean(1) t = 2.2013

Ho: diff = 0 degrees of freedom = 34

Ha: diff 0

Pr(T |t|) = 0.0346 Pr(T > t) = 0.0173

LikeLike

Well written piece. I think this officially debubks the “fraud” & “manipulation” thesis.

LikeLike