More data is not necessarily better data, whatever the source.
Claims based on lots of data can be misleading. Sometimes this is called “big data” (data from large databases) or “real world data” (routinely collected data). Unfortunately, routinely collected data often does not include data about “confounders”.
Confounders are factors other than the interventions being compared that can affect the outcomes. For example, a study might compare two different crop varieties to find out which has the highest yield. If one variety is only ever grown and tested on sandy soils, and the other on clay-based soils, the soil type would be a confounder since it can impact on yield.
When using routinely collected data, it is only possible to control for confounders that were already known and were measured when the data was collected. So, we cannot be sure that an association between an intervention and an outcome means that the intervention caused the outcome, rather than confounders.
REMEMBER: Think about whether you can be sure that there aren’t other reasons for the association.