New AI methods advance causal discovery for complex, noisy, and large-scale data
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 31 sources
Several recent arXiv papers introduce novel methods and benchmarks for causal discovery, a field focused on identifying cause-and-effect relationships from data. These advancements include techniques for handling noisy or incomplete data, integrating expert knowledge, and improving scalability for large datasets. New benchmarks and testing frameworks are also being developed to rigorously evaluate the robustness of existing causal discovery algorithms against various assumption violations, particularly in time-series data and natural language reasoning.
AI
IMPACT
Advances in causal discovery methods could lead to more reliable AI systems capable of understanding and reasoning about cause-and-effect relationships, particularly in complex or noisy environments.
RANK_REASON
Multiple arXiv papers published on May 7, 2026, detailing new methods and benchmarks for causal discovery.
Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been …
In causal inference, confounders are variables that influence both treatment decisions and outcomes. However, unlike as in randomized clinical trials, the treatment assignment mechanism in observational studies is not known, and it is thus unclear which covariates act as confound…
Recent work on causal abstraction, in particular graphical approaches focusing on causal structure between clusters of variables, aims to summarize a high-dimensional causal structure in terms of a low-dimensional one. Existing methods for learning such summaries from data assume…
Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity a…
arXiv:2605.05743v1 Announce Type: cross Abstract: Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based meth…
arXiv cs.LG
TIER_1·Marvin Sextro, Weronika K{\l}os, Gabriel Dernbach·
arXiv:2601.21092v3 Announce Type: replace Abstract: Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span onl…
arXiv cs.LG
TIER_1·Adrick Tench, Thomas Demeester·
arXiv:2601.16715v2 Announce Type: replace Abstract: Would-be practitioners of causal discovery face a dizzying array of algorithms without a clear best choice. This abundance of competitive methods makes ensembling a natural strategy for practical applications. At the same time, …
arXiv cs.LG
TIER_1·Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang, Yihang Wang, Lu Cheng·
arXiv:2605.05524v1 Announce Type: new Abstract: Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does n…
arXiv:2605.05568v1 Announce Type: cross Abstract: Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesk…
arXiv cs.LG
TIER_1·Bruno Petrungaro, Anthony C. Constantinou·
arXiv:2605.04081v1 Announce Type: new Abstract: Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs req…
arXiv cs.LG
TIER_1·Geert Mesters, Alvaro Ribot, Anna Seigal, Piotr Zwiernik·
arXiv:2605.04381v1 Announce Type: cross Abstract: Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of depend…
arXiv cs.LG
TIER_1·Thomas S. Robinson, Ranjit Lall·
arXiv:2605.04838v1 Announce Type: cross Abstract: The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability a…
arXiv:2605.04313v1 Announce Type: new Abstract: Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (L…
arXiv cs.LG
TIER_1·Gideon Stein, Niklas Penzel, Tristan Piater, Joachim Denzler·
arXiv:2605.03045v1 Announce Type: new Abstract: Causal Discovery (CD) is a powerful framework for scientific inquiry. Yet, its practical adoption is hindered by a reliance on strong, often unverifiable assumptions and a lack of robust performance assessment. To address these limi…
Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong ca…
Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities,…
arXiv stat.ML
TIER_1·Jin Du, Li Chen, Xun Xian, An Luo, Fangqiao Tian, Ganghua Wang, Charles Doss, Xiaotong Shen, Jie Ding·
arXiv:2505.13770v3 Announce Type: replace-cross Abstract: Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustwo…
arXiv stat.ML
TIER_1·Oliver J. Hines, Caleb H. Miles·
arXiv:2510.16127v2 Announce Type: replace Abstract: The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, inde…
Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been …
Causal sensitivity analysis aims to provide bounds for causal effect estimates in the presence of unobserved confounding. However, existing methods for causal sensitivity analysis are per-instance procedures, meaning that changes to the dataset, causal query, sensitivity level, o…
arXiv:2605.06993v1 Announce Type: cross Abstract: Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcome…
arXiv stat.ML
TIER_1·Shakeel Gavioli-Akilagun, Kieran Wood, Francesco Quinzan·
arXiv:2605.05809v1 Announce Type: cross Abstract: We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a marke…
Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that m…
We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a market indicator on asset returns may causally change o…
Gaussian process marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present two complementary RFF-based methods forming a practical toolkit for score-based, c…
Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recov…
The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spuriou…
Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong ca…
arXiv:2605.01669v1 Announce Type: new Abstract: External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also \emph{heterogeneously} reliable -- physical laws are trustworth…
External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also \emph{heterogeneously} reliable -- physical laws are trustworthy, LLM-suggested edges are speculative -- yet ex…