We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 65 entries: 1-65 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 1 Mar 24

[1]  arXiv:2402.18612 [pdf, ps, other]
Title: Understanding random forests and overfitting: a visualization and simulation study
Comments: 20 pages, 8 figures
Subjects: Methodology (stat.ME); Computers and Society (cs.CY); Machine Learning (cs.LG)

Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training c-statistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For the case studies, risk estimates were visualised using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true c-statistic and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 were simulated and RF models trained with minimum node size 2 or 20 using ranger package, resulting in 192 scenarios in total. The visualizations suggested that the model learned spikes of probability around events in the training set. A cluster of events created a bigger peak, isolated events local peaks. In the simulation study, median training c-statistics were between 0.97 and 1 unless there were 4 or 16 binary predictors with minimum node size 20. Median test c-statistics were higher with higher events per variable, higher minimum node size, and binary predictors. Median training slopes were always above 1, and were not correlated with median test slopes across scenarios (correlation -0.11). Median test slopes were higher with higher true c-statistic, higher minimum node size, and higher sample size. Random forests learn local probability peaks that often yield near perfect training c-statistics without strongly affecting c-statistics on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

[2]  arXiv:2402.18697 [pdf, other]
Title: Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Optimization and Control (math.OC); Statistics Theory (math.ST)

A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.

[3]  arXiv:2402.18741 [pdf, other]
Title: Spectral Extraction of Unique Latent Variables
Subjects: Methodology (stat.ME)

Multimodal datasets contain observations generated by multiple types of sensors. Most works to date focus on uncovering latent structures in the data that appear in all modalities. However, important aspects of the data may appear in only one modality due to the differences between the sensors. Uncovering modality-specific attributes may provide insights into the sources of the variability of the data. For example, certain clusters may appear in the analysis of genetics but not in epigenetic markers. Another example is hyper-spectral satellite imaging, where various atmospheric and ground phenomena are detectable using different parts of the spectrum. In this paper, we address the problem of uncovering latent structures that are unique to a single modality. Our approach is based on computing a graph representation of datasets from two modalities and analyzing the differences between their connectivity patterns. We provide an asymptotic analysis of the convergence of our approach based on a product manifold model. To evaluate the performance of our method, we test its ability to uncover latent structures in multiple types of artificial and real datasets.

[4]  arXiv:2402.18745 [pdf, other]
Title: Degree-heterogeneous Latent Class Analysis for High-dimensional Discrete Data
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

The latent class model is a widely used mixture model for multivariate discrete data. Besides the existence of qualitatively heterogeneous latent classes, real data often exhibit additional quantitative heterogeneity nested within each latent class. The modern latent class analysis also faces extra challenges, including the high-dimensionality, sparsity, and heteroskedastic noise inherent in discrete data. Motivated by these phenomena, we introduce the Degree-heterogeneous Latent Class Model and propose a spectral approach to clustering and statistical inference in the challenging high-dimensional sparse data regime. We propose an easy-to-implement HeteroClustering algorithm. It uses heteroskedastic PCA with L2 normalization to remove degree effects and perform clustering in the top singular subspace of the data matrix. We establish an exponential error rate for HeteroClustering, leading to exact clustering under minimal signal-to-noise conditions. We further investigate the estimation and inference of the high-dimensional continuous item parameters in the model, which are crucial to interpreting and finding useful markers for latent classes. We provide comprehensive procedures for global testing and multiple testing of these parameters with valid error controls. The superior performance of our methods is demonstrated through extensive simulations and applications to three diverse real-world datasets from political voting records, genetic variations, and single-cell sequencing.

[5]  arXiv:2402.18748 [pdf, other]
Title: Fast Bootstrapping Nonparametric Maximum Likelihood for Latent Mixture Models
Comments: 6 pages (main article is 4 pages, one page of references, and one page Appendix). 5 figures and 4 tables. This paper supersedes a previously circulated technical report by S. Wang and M. Shin (arXiv:2006.00767v2.pdf)
Subjects: Methodology (stat.ME)

Estimating the mixing density of a latent mixture model is an important task in signal processing. Nonparametric maximum likelihood estimation is one popular approach to this problem. If the latent variable distribution is assumed to be continuous, then bootstrapping can be used to approximate it. However, traditional bootstrapping requires repeated evaluations on resampled data and is not scalable. In this letter, we construct a generative process to rapidly produce nonparametric maximum likelihood bootstrap estimates. Our method requires only a single evaluation of a novel two-stage optimization algorithm. Simulations and real data analyses demonstrate that our procedure accurately estimates the mixing density with little computational cost even when there are a hundred thousand observations.

[6]  arXiv:2402.18810 [pdf, ps, other]
Title: The numeraire e-variable
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We consider testing a composite null hypothesis $\mathcal{P}$ against a point alternative $\mathbb{Q}$. This paper establishes a powerful and general result: under no conditions whatsoever on $\mathcal{P}$ or $\mathbb{Q}$, we show that there exists a special e-variable $X^*$ that we call the numeraire. It is strictly positive and for every $\mathbb{P} \in \mathcal{P}$, $\mathbb{E}_\mathbb{P}[X^*] \le 1$ (the e-variable property), while for every other e-variable $X$, we have $\mathbb{E}_\mathbb{Q}[X/X^*] \le 1$ (the numeraire property). In particular, this implies $\mathbb{E}_\mathbb{Q}[\log(X/X^*)] \le 0$ (log-optimality). $X^*$ also identifies a particular sub-probability measure $\mathbb{P}^*$ via the density $d \mathbb{P}^*/d \mathbb{Q} = 1/X^*$. As a result, $X^*$ can be seen as a generalized likelihood ratio of $\mathbb{Q}$ against $\mathcal{P}$. We show that $\mathbb{P}^*$ coincides with the reverse information projection (RIPr) when additional assumptions are made that are required for the latter to exist. Thus $\mathbb{P}^*$ is a natural definition of the RIPr in the absence of any assumptions on $\mathcal{P}$ or $\mathbb{Q}$. In addition to the abstract theory, we provide several tools for finding the numeraire in concrete cases. We discuss several nonparametric examples where we can indeed identify the numeraire, despite not having a reference measure. We end with a more general optimality theory that goes beyond the ubiquitous logarithmic utility. We focus on certain power utilities, leading to reverse R\'enyi projections in place of the RIPr, which also always exists.

[7]  arXiv:2402.18900 [pdf, ps, other]
Title: Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials
Comments: 27 pages, 1 figure, 9 tables
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)

Randomized controlled trials (RCTs) with binary primary endpoints introduce novel challenges for inferring the causal effects of treatments. The most significant challenge is non-collapsibility, in which the conditional odds ratio estimand under covariate adjustment differs from the unconditional estimand in the logistic regression analysis of RCT data. This issue gives rise to apparent paradoxes, such as the variance of the estimator for the conditional odds ratio from a covariate-adjusted model being greater than the variance of the estimator from the unadjusted model. We address this challenge in the context of adjustment based on predictions of control outcomes from generative artificial intelligence (AI) algorithms, which are referred to as prognostic scores. We demonstrate that prognostic score adjustment in logistic regression increases the power of the Wald test for the conditional odds ratio under a fixed sample size, or alternatively reduces the necessary sample size to achieve a desired power, compared to the unadjusted analysis. We derive formulae for prospective calculations of the power gain and sample size reduction that can result from adjustment for the prognostic score. Furthermore, we utilize g-computation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands. We demonstrate the validity of our formulae via extensive simulation studies that encompass different types of logistic regression model specifications. Our simulation studies also indicate how prognostic score adjustment can reduce the variance of g-computation estimators for the marginal estimands while maintaining frequentist properties such as asymptotic unbiasedness and Type I error rate control. Our methodology can ultimately enable more definitive and conclusive analyses for RCTs with binary primary endpoints.

[8]  arXiv:2402.18904 [pdf, other]
Title: False Discovery Rate Control for Confounder Selection Using Mirror Statistics
Subjects: Methodology (stat.ME)

While data-driven confounder selection requires careful consideration, it is frequently employed in observational studies to adjust for confounding factors. Widely recognized criteria for confounder selection include the minimal set approach, which involves selecting variables relevant to both treatment and outcome, and the union set approach, which involves selecting variables for either treatment or outcome. These approaches are often implemented using heuristics and off-the-shelf statistical methods, where the degree of uncertainty may not be clear. In this paper, we focus on the false discovery rate (FDR) to measure uncertainty in confounder selection. We define the FDR specific to confounder selection and propose methods based on the mirror statistic, a recently developed approach for FDR control that does not rely on p-values. The proposed methods are free from p-values and require only the assumption of some symmetry in the distribution of the mirror statistic. It can be easily combined with sparse estimation and other methods that involve difficulties in deriving p-values. The properties of the proposed method are investigated by exhaustive numerical experiments. Particularly in high-dimensional data scenarios, our method outperforms conventional methods.

[9]  arXiv:2402.18921 [pdf, other]
Title: Semi-Supervised U-statistics
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.

[10]  arXiv:2402.19021 [pdf, other]
Title: Enhancing the Power of Gaussian Graphical Model Inference by Modeling the Graph Structure
Authors: Valentin Kilian, Tabea Rebafka (LPSM (UMR\_8001)), Fanny Villers (LPSM (UMR\_8001))
Subjects: Methodology (stat.ME)

For the problem of inferring a Gaussian graphical model (GGM), this work explores the application of a recent approach from the multiple testing literature for graph inference. The main idea of the method by Rebafka et al. (2022) is to model the data by a latent variable model, the so-called noisy stochastic block model (NSBM), and then use the associated ${\ell}$-values to infer the graph. The inferred graph controls the false discovery rate, that means that the proportion of falsely declared edges does not exceed a user-defined nominal level. Here it is shown that any test statistic from the GGM literature can be used as input for the NSBM approach to perform GGM inference. To make the approach feasible in practice, a new, computationally efficient inference algorithm for the NSBM is developed relying on a greedy approach to maximize the integrated complete-data likelihood. Then an extensive numerical study illustrates that the NSBM approach outperforms the state of the art for any of the here considered GGM-test statistics. In particular in sparse settings and on real datasets a significant gain in power is observed.

[11]  arXiv:2402.19029 [pdf, ps, other]
Title: Essential Properties of Type III* Methods
Authors: Lynn Roy LaMotte
Subjects: Methodology (stat.ME)

Type III methods, introduced by SAS in 1976, formulate estimable functions that substitute, somehow, for classical ANOVA effects in multiple linear regression models. They have been controversial since, provoking wide use and satisfied users on the one hand and skepticism and scorn on the other. Their essential mathematical properties have not been established, although they are widely thought to be known: what those functions are, to what extent they coincide with classical ANOVA effects, and how they are affected by cell sample sizes, empty cells, and covariates. Those properties are established here.

[12]  arXiv:2402.19036 [pdf, other]
Title: Empirical Bayes in Bayesian learning: understanding a common practice
Subjects: Statistics Theory (math.ST)

In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a case-by-case basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.

[13]  arXiv:2402.19046 [pdf, other]
Title: On the Improvement of Predictive Modeling Using Bayesian Stacking and Posterior Predictive Checking
Comments: 40 pages including abstract and references (23 pages without), 3 figures
Subjects: Methodology (stat.ME)

Model uncertainty is pervasive in real world analysis situations and is an often-neglected issue in applied statistics. However, standard approaches to the research process do not address the inherent uncertainty in model building and, thus, can lead to overconfident and misleading analysis interpretations. One strategy to incorporate more flexible models is to base inferences on predictive modeling. This approach provides an alternative to existing explanatory models, as inference is focused on the posterior predictive distribution of the response variable. Predictive modeling can advance explanatory ambitions in the social sciences and in addition enrich the understanding of social phenomena under investigation. Bayesian stacking is a methodological approach rooted in Bayesian predictive modeling. In this paper, we outline the method of Bayesian stacking but add to it the approach of posterior predictive checking (PPC) as a means of assessing the predictive quality of those elements of the stacking ensemble that are important to the research question. Thus, we introduce a viable workflow for incorporating PPC into predictive modeling using Bayesian stacking without presuming the existence of a true model. We apply these tools to the PISA 2018 data to investigate potential inequalities in reading competency with respect to gender and socio-economic background. Our empirical example serves as rough guideline for practitioners who want to implement the concepts of predictive modeling and model uncertainty in their work to similar research questions.

[14]  arXiv:2402.19109 [pdf, other]
Title: Confidence and Assurance of Percentiles
Authors: Sanjay M. Joshi
Comments: 5 pages, 4 Figures
Subjects: Methodology (stat.ME); Information Theory (cs.IT)

Confidence interval of mean is often used when quoting statistics. The same rigor is often missing when quoting percentiles and tolerance or percentile intervals. This article derives the expression for confidence in percentiles of a sample population. Confidence intervals of median is compared to those of mean for a few sample distributions. The concept of assurance from reliability engineering is then extended to percentiles. The assurance level of sorted samples simply matches the confidence and percentile levels. Numerical method to compute assurance using Brent's optimization method is provided as an open-source python package.

[15]  arXiv:2402.19162 [pdf, other]
Title: A Bayesian approach to uncover spatio-temporal determinants of heterogeneity in repeated cross-sectional health surveys
Subjects: Applications (stat.AP); Methodology (stat.ME); Other Statistics (stat.OT)

In several countries, including Italy, a prominent approach to population health surveillance involves conducting repeated cross-sectional surveys at short intervals of time. These surveys gather information on the health status of individual respondents, including details on their behaviors, risk factors, and relevant socio-demographic information. While the collected data undoubtedly provides valuable information, modeling such data presents several challenges. For instance, in health risk models, it is essential to consider behavioral information, spatio-temporal dynamics, and disease co-occurrence. In response to these challenges, our work proposes a multivariate spatio-temporal logistic model for chronic disease diagnoses. Predictors are modeled using individual risk factor covariates and a latent individual propensity to the disease.
Leveraging a state space formulation of the model, we construct a framework in which spatio-temporal heterogeneity in regression parameters is informed by exogenous spatial information, corresponding to different spatial contextual risk factors that may affect health and the occurrence of chronic diseases in different ways. To explore the utility and the effectiveness of our method, we analyze behavioral and risk factor surveillance data collected in Italy (PASSI), which is well-known as a country characterized by high peculiar administrative, social and territorial diversities reflected on high variability in morbidity among population subgroups.

[16]  arXiv:2402.19209 [pdf, other]
Title: Call center data analysis and model validation
Subjects: Applications (stat.AP)

We analyze call center data on properties such as agent heterogeneity, customer patience and breaks. Then we compare simulation models that are different in the ways these properties are modeled. We classify them according to the extend in which they approach the actual service level and average waiting times. We obtain a theoretical understanding on how to distinguish between the model error and other aspects such as random noise. We conclude that modeling explicitly breaks and agent heterogeneity is crucial for obtaining a precise model.

[17]  arXiv:2402.19214 [pdf, other]
Title: A Bayesian approach with Gaussian priors to the inverse problem of source identification in elliptic PDEs
Authors: Matteo Giordano
Comments: 16 Pages. The reproducible code is available at: this https URL
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posterior-based estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the Dirichlet-Laplacian eigenbasis and Mat\'ern process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study.

[18]  arXiv:2402.19268 [pdf, ps, other]
Title: Extremal quantiles of intermediate orders under two-way clustering
Subjects: Statistics Theory (math.ST); Econometrics (econ.EM)

This paper investigates extremal quantiles under two-way cluster dependence. We demonstrate that the limiting distribution of the unconditional intermediate order quantiles in the tails converges to a Gaussian distribution. This is remarkable as two-way cluster dependence entails potential non-Gaussianity in general, but extremal quantiles do not suffer from this issue. Building upon this result, we extend our analysis to extremal quantile regressions of intermediate order.

[19]  arXiv:2402.19346 [pdf, ps, other]
Title: Recanting witness and natural direct effects: Violations of assumptions or definitions?
Authors: Ian Shrier
Comments: 5 pages, 1 figure
Subjects: Methodology (stat.ME)

There have been numerous publications on the advantages and disadvantages of estimating natural (pure) effects compared to controlled effects. One of the main criticisms of natural effects is that it requires an additional assumption for identifiability, namely that the exposure does not cause a confounder of the mediator-outcome relationship. However, every analysis in every study should begin with a research question expressed in ordinary language. Researchers then develop/use mathematical expressions or estimators to best answer these ordinary language questions. When a recanting witness is present, the paper illustrates that there are no violations of assumptions. Rather, using directed acyclic graphs, the typical estimators for natural effects are simply no longer answering any meaningful question. Although some might view this as semantics, the proposed approach illustrates why the more recent methods of path-specific effects and separable effects are more valid and transparent compared to previous methods for decomposition analysis.

[20]  arXiv:2402.19455 [pdf, other]
Title: Listening to the Noise: Blind Denoising with Gibbs Diffusion
Comments: 12+8 pages, 7+3 figures, 1+1 tables, code: this https URL
Subjects: Machine Learning (stat.ML); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)

In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the family of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and spectral index, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe.

Cross-lists for Fri, 1 Mar 24

[21]  arXiv:2402.18579 (cross-list from cs.CV) [pdf, ps, other]
Title: Wilcoxon Nonparametric CFAR Scheme for Ship Detection in SAR Image
Authors: Xiangwei Meng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Applications (stat.AP)

The parametric constant false alarm rate (CFAR) detection algorithms which are based on various statistical distributions, such as Gaussian, Gamma, Weibull, log-normal, G0 distribution, alpha-stable distribution, etc, are most widely used to detect the ship targets in SAR image at present. However, the clutter background in SAR images is complicated and variable. When the actual clutter background deviates from the assumed statistical distribution, the performance of the parametric CFAR detector will deteriorate. In addition to the parametric CFAR schemes, there is another class of nonparametric CFAR detectors which can maintain a constant false alarm rate for the target detection without the assumption of a known clutter distribution. In this work, the Wilcoxon nonparametric CFAR scheme for ship detection in SAR image is proposed and analyzed, and a closed form of the false alarm rate for the Wilcoxon nonparametric detector to determine the decision threshold is presented. By comparison with several typical parametric CFAR schemes on Radarsat-2, ICEYE-X6 and Gaofen-3 SAR images, the robustness of the Wilcoxon nonparametric detector to maintain a good false alarm performance in different detection backgrounds is revealed, and its detection performance for the weak ship in rough sea surface is improved to some extent. Moreover, the Wilcoxon nonparametric detector can suppress the false alarms resulting from the sidelobes at some degree and its detection speed is fast.

[22]  arXiv:2402.18591 (cross-list from cs.LG) [pdf, ps, other]
Title: Stochastic contextual bandits with graph feedback: from independence number to MAS number
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Statistics Theory (math.ST)

We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graph-theoretical quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve near-optimal regrets for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number completely characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multi-armed bandits.

[23]  arXiv:2402.18651 (cross-list from cs.LG) [pdf, other]
Title: Quantifying Human Priors over Social and Navigation Networks
Comments: Published on Proceedings of the 40th International Conference on Machine Learning (ICML), PMLR 202:3063-3105, 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Neurons and Cognition (q-bio.NC); Methodology (stat.ME)

Human knowledge is largely implicit and relational -- do we have a friend in common? can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data. Our experiments focus on two domains that have been continuously relevant over evolutionary timescales: social interaction and spatial navigation. We find that some features of the inferred priors are remarkably consistent, such as the tendency for sparsity as a function of graph size. Other features are domain-specific, such as the propensity for triadic closure in social interactions. More broadly, our work demonstrates how nonclassical statistical analysis of indirect behavioral experiments can be used to efficiently model latent biases in the data.

[24]  arXiv:2402.18666 (cross-list from math.OC) [pdf, other]
Title: Linear shrinkage for optimization in high dimensions
Subjects: Optimization and Control (math.OC); Statistics Theory (math.ST)

In large-scale, data-driven applications, parameters are often only known approximately due to noise and limited data samples. In this paper, we focus on high-dimensional optimization problems with linear constraints under uncertain conditions. To find high quality solutions for which the violation of the true constraints is limited, we develop a linear shrinkage method that blends random matrix theory and robust optimization principles. It aims to minimize the Frobenius distance between the estimated and the true parameter matrix, especially when dealing with a large and comparable number of constraints and variables. This data-driven method excels in simulations, showing superior noise resilience and more stable performance in both obtaining high quality solutions and adhering to the true constraints compared to traditional robust optimization. Our findings highlight the effectiveness of our method in improving the robustness and reliability of optimization in high-dimensional, data-driven scenarios.

[25]  arXiv:2402.18689 (cross-list from cs.LG) [pdf, other]
Title: The VOROS: Lifting ROC curves to 3D
Comments: 38 pages, 19 figures
Subjects: Machine Learning (cs.LG); Metric Geometry (math.MG); Statistics Theory (math.ST); Methodology (stat.ME)

The area under the ROC curve is a common measure that is often used to rank the relative performance of different binary classifiers. However, as has been also previously noted, it can be a measure that ill-captures the benefits of different classifiers when either the true class values or misclassification costs are highly unbalanced between the two classes. We introduce a third dimension to capture these costs, and lift the ROC curve to a ROC surface in a natural way. We study both this surface and introduce the VOROS, the volume over this ROC surface, as a 3D generalization of the 2D area under the ROC curve. For problems where there are only bounds on the expected costs or class imbalances, we restrict consideration to the volume of the appropriate subregion of the ROC surface. We show how the VOROS can better capture the costs of different classifiers on both a classical and a modern example dataset.

[26]  arXiv:2402.18724 (cross-list from cs.LG) [pdf, other]
Title: Learning Associative Memories with Gradient Descent
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic growth of the ``classification margins.'' Yet, we show that imbalance in token frequencies and memory interferences due to correlated embeddings lead to oscillatory transitory regimes. The oscillations are more pronounced with large step sizes, which can create benign loss spikes, although these learning rates speed up the dynamics and accelerate the asymptotic convergence. In underparameterized regimes, we illustrate how the cross-entropy loss can lead to suboptimal memorization schemes. Finally, we assess the validity of our findings on small Transformer models.

[27]  arXiv:2402.18800 (cross-list from cs.LG) [pdf, other]
Title: BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Block-wise missing data poses significant challenges in real-world data imputation tasks. Compared to scattered missing data, block-wise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOTA matrix completion methods appeared less effective, primarily due to overreliance on neighboring elements for predictions. We systematically analyze the issue and propose a novel matrix completion method ``BlockEcho" for a more comprehensive solution. This method creatively integrates Matrix Factorization (MF) within Generative Adversarial Networks (GAN) to explicitly retain long-distance inter-element relationships in the original matrix. Besides, we incorporate an additional discriminator for GAN, comparing the generator's intermediate progress with pre-trained MF results to constrain high-order feature distributions. Subsequently, we evaluate BlockEcho on public datasets across three domains. Results demonstrate superior performance over both traditional and SOTA methods when imputing block-wise missing data, especially at higher missing rates. The advantage also holds for scattered missing data at high missing rates. We also contribute on the analyses in providing theoretical justification on the optimality and convergence of fusing MF and GAN for missing block data.

[28]  arXiv:2402.18805 (cross-list from cs.SI) [pdf, other]
Title: VEC-SBM: Optimal Community Detection with Vectorial Edges Covariates
Subjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Social networks are often associated with rich side information, such as texts and images. While numerous methods have been developed to identify communities from pairwise interactions, they usually ignore such side information. In this work, we study an extension of the Stochastic Block Model (SBM), a widely used statistical framework for community detection, that integrates vectorial edges covariates: the Vectorial Edges Covariates Stochastic Block Model (VEC-SBM). We propose a novel algorithm based on iterative refinement techniques and show that it optimally recovers the latent communities under the VEC-SBM. Furthermore, we rigorously assess the added value of leveraging edge's side information in the community detection process. We complement our theoretical results with numerical experiments on synthetic and semi-synthetic data.

[29]  arXiv:2402.18851 (cross-list from cs.LG) [pdf, other]
Title: Applications of 0-1 Neural Networks in Prescription and Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

A key challenge in medical decision making is learning treatment policies for patients with limited observational data. This challenge is particularly evident in personalized healthcare decision-making, where models need to take into account the intricate relationships between patient characteristics, treatment options, and health outcomes. To address this, we introduce prescriptive networks (PNNs), shallow 0-1 neural networks trained with mixed integer programming that can be used with counterfactual estimation to optimize policies in medium data settings. These models offer greater interpretability than deep neural networks and can encode more complex policies than common models such as decision trees. We show that PNNs can outperform existing methods in both synthetic data experiments and in a case study of assigning treatments for postpartum hypertension. In particular, PNNs are shown to produce policies that could reduce peak blood pressure by 5.47 mm Hg (p=0.02) over existing clinical practice, and by 2 mm Hg (p=0.01) over the next best prescriptive modeling technique. Moreover PNNs were more likely than all other models to correctly identify clinically significant features while existing models relied on potentially dangerous features such as patient insurance information and race that could lead to bias in treatment.

[30]  arXiv:2402.18884 (cross-list from cs.LG) [pdf, ps, other]
Title: Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC). These results indicate that the final hidden-layer outputs in such networks display minimal within-class variations over the training set. While existing research extensively investigates this phenomenon under cross-entropy loss, there are fewer studies focusing on its contrastive counterpart, supervised contrastive (SC) loss. Through the lens of NC, this paper employs an analytical approach to study the solutions derived from optimizing the SC loss. We adopt the unconstrained features model (UFM) as a representative proxy for unveiling NC-related phenomena in sufficiently over-parameterized deep networks. We show that, despite the non-convexity of SC loss minimization, all local minima are global minima. Furthermore, the minimizer is unique (up to a rotation). We prove our results by formalizing a tight convex relaxation of the UFM. Finally, through this convex formulation, we delve deeper into characterizing the properties of global solutions under label-imbalanced training data.

[31]  arXiv:2402.18910 (cross-list from cs.LG) [pdf, other]
Title: DIGIC: Domain Generalizable Imitation Learning by Causal Discovery
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)

Causality has been combined with machine learning to produce robust representations for domain generalization. Most existing methods of this type require massive data from multiple domains to identify causal features by cross-domain variations, which can be expensive or even infeasible and may lead to misidentification in some cases. In this work, we make a different attempt by leveraging the demonstration data distribution to discover the causal features for a domain generalizable policy. We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution via causal discovery. Our framework can achieve domain generalizable imitation learning with only single-domain data and serve as a complement for cross-domain variation-based methods under non-structural assumptions on the underlying causal models. Our empirical study in various control tasks shows that the proposed framework evidently improves the domain generalization performance and has comparable performance to the expert in the original domain simultaneously.

[32]  arXiv:2402.18995 (cross-list from cs.LG) [pdf, other]
Title: Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Modeling count-valued time series has been receiving increasing attention since count time series naturally arise in physical and social domains. Poisson gamma dynamical systems (PGDSs) are newly-developed methods, which can well capture the expressive latent transition structure and bursty dynamics behind count sequences. In particular, PGDSs demonstrate superior performance in terms of data imputation and prediction, compared with canonical linear dynamical system (LDS) based methods. Despite these advantages, PGDS cannot capture the heterogeneous overdispersed behaviours of the underlying dynamic processes. To mitigate this defect, we propose a negative-binomial-randomized gamma Markov process, which not only significantly improves the predictive performance of the proposed dynamical system, but also facilitates the fast convergence of the inference algorithm. Moreover, we develop methods to estimate both factor-structured and graph-structured transition dynamics, which enable us to infer more explainable latent structure, compared with PGDSs. Finally, we demonstrate the explainable latent structure learned by the proposed method, and show its superior performance in imputing missing data and forecasting future observations, compared with the related models.

[33]  arXiv:2402.19442 (cross-list from cs.LG) [pdf, other]
Title: Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
Comments: 141 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)

We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model. Specifically, we prove that the gradient flow dynamics can be split into three phases -- a warm-up phase where the loss decreases rather slowly and the attention heads gradually build up their inclination towards individual tasks, an emergence phase where each head selects a single task and the loss rapidly decreases, and a convergence phase where the attention parameters converge to a limit. Furthermore, we prove the optimality of gradient flow in the sense that the limiting model learned by gradient flow is on par with the best possible multi-head softmax attention model up to a constant factor. Our analysis also delineates a strict separation in terms of the prediction accuracy of ICL between single-head and multi-head attention models. The key technique for our convergence analysis is to map the gradient flow dynamics in the parameter space to a set of ordinary differential equations in the spectral domain, where the relative magnitudes of the semi-singular values of the attention weights determines task allocation. To our best knowledge, our work provides the first convergence result for the multi-head softmax attention model.

[34]  arXiv:2402.19449 (cross-list from cs.LG) [pdf, other]
Title: Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)

Adam has been shown to outperform gradient descent in optimizing large language transformers empirically, and by a larger margin than on other tasks, but it is unclear why this happens. We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics. When training with gradient descent, the loss associated with infrequent words decreases slower than the loss associated with frequent ones. As most samples come from relatively infrequent words, the average loss decreases slowly with gradient descent. On the other hand, Adam and sign-based methods do not suffer from this problem and improve predictions on all classes. To establish that this behavior is indeed caused by class imbalance, we show empirically that it persist through different architectures and data types, on language transformers, vision CNNs, and linear models. We further study this phenomenon on a linear classification with cross-entropy loss, showing that heavy-tailed class imbalance leads to ill-conditioning, and that the normalization used by Adam can counteract it.

[35]  arXiv:2402.19456 (cross-list from quant-ph) [pdf, other]
Title: Statistical Estimation in the Spiked Tensor Model via the Quantum Approximate Optimization Algorithm
Comments: 51 pages, 4 figures, 1 table
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)

The quantum approximate optimization algorithm (QAOA) is a general-purpose algorithm for combinatorial optimization. In this paper, we analyze the performance of the QAOA on a statistical estimation problem, namely, the spiked tensor model, which exhibits a statistical-computational gap classically. We prove that the weak recovery threshold of $1$-step QAOA matches that of $1$-step tensor power iteration. Additional heuristic calculations suggest that the weak recovery threshold of $p$-step QAOA matches that of $p$-step tensor power iteration when $p$ is a fixed constant. This further implies that multi-step QAOA with tensor unfolding could achieve, but not surpass, the classical computation threshold $\Theta(n^{(q-2)/4})$ for spiked $q$-tensors.
Meanwhile, we characterize the asymptotic overlap distribution for $p$-step QAOA, finding an intriguing sine-Gaussian law verified through simulations. For some $p$ and $q$, the QAOA attains an overlap that is larger by a constant factor than the tensor power iteration overlap. Of independent interest, our proof techniques employ the Fourier transform to handle difficult combinatorial sums, a novel approach differing from prior QAOA analyses on spin-glass models without planted structure.

[36]  arXiv:2402.19460 (cross-list from cs.LG) [pdf, other]
Title: Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks
Comments: 43 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one task. Hence, there is a plethora of recent advances with different intentions - that often entirely deviate from practical behavior. This paper conducts a comprehensive evaluation of numerous uncertainty estimators across diverse tasks on ImageNet. We find that, despite promising theoretical endeavors, disentanglement is not yet achieved in practice. Additionally, we reveal which uncertainty estimators excel at which specific tasks, providing insights for practitioners and guiding future research toward task-centric and disentangled uncertainty estimation methods. Our code is available at https://github.com/bmucsanyi/bud.

Replacements for Fri, 1 Mar 24

[37]  arXiv:1701.07078 (replaced) [pdf, ps, other]
Title: Measurement-to-Track Association and Finite-Set Statistics
Authors: Ronald Mahler
Comments: 7 pages, no figures
Subjects: Methodology (stat.ME)
[38]  arXiv:1806.05451 (replaced) [pdf, other]
Title: The committee machine: Computational to statistical gaps in learning a two-layers neural network
Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)
Journal-ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
[39]  arXiv:1903.11198 (replaced) [pdf, other]
Title: Parallel Experimentation and Competitive Interference on Online Advertising Platforms
Subjects: General Economics (econ.GN); Applications (stat.AP)
[40]  arXiv:2006.10628 (replaced) [pdf, other]
Title: Offline detection of change-points in the mean for stationary graph signals
Comments: 16 pages, 2 figures, 1 table, 1 annex. 9 pages of main text
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
[41]  arXiv:2203.01360 (replaced) [pdf, other]
Title: Neural Galerkin Schemes with Active Learning for High-Dimensional Evolution Equations
Journal-ref: Journal of Computational Physics, Volume 496, 2024
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
[42]  arXiv:2204.07672 (replaced) [pdf, other]
Title: Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect
Subjects: Econometrics (econ.EM); Methodology (stat.ME)
[43]  arXiv:2210.14054 (replaced) [pdf, ps, other]
Title: Reduced-Dimension Surrogate Modeling to Characterize the Damage Tolerance of Composite/Metal Structures
Comments: 32 pages, 15 figures, 12 tables
Journal-ref: Modelling 2023, 4, 485-514
Subjects: Applications (stat.AP)
[44]  arXiv:2210.14484 (replaced) [pdf, other]
Title: Imputation of missing values in multi-view data
Comments: 48 pages, 15 figures. Major revisions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[45]  arXiv:2212.06669 (replaced) [pdf, ps, other]
Title: A scale of interpretation for likelihood ratios and Bayes factors
Authors: Frank Dudbridge
Journal-ref: PLoS ONE 19(2): e0297874 (2024)
Subjects: Methodology (stat.ME)
[46]  arXiv:2301.06297 (replaced) [pdf, other]
Title: Inference via robust optimal transportation: theory and methods
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[47]  arXiv:2303.12407 (replaced) [pdf, ps, other]
Title: Non-asymptotic analysis of Langevin-type Monte Carlo algorithms
Authors: Shogo Nakakita
Subjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
[48]  arXiv:2305.01849 (replaced) [pdf, other]
Title: Semiparametric Discovery and Estimation of Interaction in Mixed Exposures using Stochastic Interventions
Subjects: Methodology (stat.ME)
[49]  arXiv:2305.04634 (replaced) [pdf, other]
Title: Neural Likelihood Surfaces for Spatial Processes with Computationally Intensive or Intractable Likelihoods
Comments: 65 pages, 20 figures
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[50]  arXiv:2305.15991 (replaced) [pdf, ps, other]
Title: Finite sample rates for logistic regression with small noise or few samples
Subjects: Statistics Theory (math.ST)
[51]  arXiv:2306.10405 (replaced) [pdf, other]
Title: A semi-parametric estimation method for quantile coherence with an application to bivariate financial time series clustering
Comments: 39 pages, 11 figures
Subjects: Methodology (stat.ME); Computation (stat.CO)
[52]  arXiv:2306.15012 (replaced) [pdf, other]
Title: Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures
Comments: 13+17 pages, 6+8 figures, published in TMLR, code: this https URL
Subjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Signal Processing (eess.SP)
[53]  arXiv:2309.12924 (replaced) [pdf, other]
Title: Automated grading workflows for providing personalized feedback to open-ended data science assignments
Comments: 24 pages, 3 figures
Subjects: Physics Education (physics.ed-ph); Computers and Society (cs.CY); Other Statistics (stat.OT)
[54]  arXiv:2309.16598 (replaced) [pdf, other]
Title: Cross-Prediction-Powered Inference
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[55]  arXiv:2310.01236 (replaced) [pdf, other]
Title: Mirror Diffusion Models for Constrained and Watermarked Generation
Comments: submitted to NeurIPS on 5/18 but did not arxiv per NeurIPS policy, accepted on 9/22
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[56]  arXiv:2310.11143 (replaced) [pdf, ps, other]
Title: Exploring a new machine learning based probabilistic model for high-resolution indoor radon mapping, using the German indoor radon survey data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
[57]  arXiv:2310.14720 (replaced) [pdf, other]
Title: Extended Deep Adaptive Input Normalization for Preprocessing Time Series Data for Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:2310.17273 (replaced) [pdf, other]
Title: Looping in the Human Collaborative and Explainable Bayesian Optimization
Comments: Accepted at AISTATS 2024, 24 pages, 11 figures
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
[59]  arXiv:2311.08168 (replaced) [pdf, other]
Title: Time-Uniform Confidence Spheres for Means of Random Vectors
Comments: 46 pages, 1 figure
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME); Machine Learning (stat.ML)
[60]  arXiv:2312.02959 (replaced) [pdf, other]
Title: Detecting algorithmic bias in medical AI-models
Comments: 26 pages, 9 figures
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
[61]  arXiv:2402.03726 (replaced) [pdf, other]
Title: Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:2402.16326 (replaced) [pdf, other]
Title: A Provably Accurate Randomized Sampling Algorithm for Logistic Regression
Comments: To appear in the proceedings of AAAI 2024
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[63]  arXiv:2402.17886 (replaced) [pdf, other]
Title: Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion
Comments: Figure 4 on page 13 corrected. Comments are welcome
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)
[64]  arXiv:2402.18510 (replaced) [pdf, other]
Title: RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Comments: 42 pages, 5 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[65]  arXiv:2402.18571 (replaced) [pdf, other]
Title: Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Comments: The code and model are released at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[ total of 65 entries: 1-65 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2403, contact, help  (Access key information)