Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 1 Mar 24
 [1] arXiv:2402.18612 [pdf, ps, other]

Title: Understanding random forests and overfitting: a visualization and simulation studyComments: 20 pages, 8 figuresSubjects: Methodology (stat.ME); Computers and Society (cs.CY); Machine Learning (cs.LG)
Random forests have become popular for clinical risk prediction modelling. In a case study on predicting ovarian malignancy, we observed training cstatistics close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behaviour of random forests by (1) visualizing data space in three real world case studies and (2) a simulation study. For the case studies, risk estimates were visualised using heatmaps in a 2dimensional subspace. The simulation study included 48 logistic data generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true cstatistic and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 were simulated and RF models trained with minimum node size 2 or 20 using ranger package, resulting in 192 scenarios in total. The visualizations suggested that the model learned spikes of probability around events in the training set. A cluster of events created a bigger peak, isolated events local peaks. In the simulation study, median training cstatistics were between 0.97 and 1 unless there were 4 or 16 binary predictors with minimum node size 20. Median test cstatistics were higher with higher events per variable, higher minimum node size, and binary predictors. Median training slopes were always above 1, and were not correlated with median test slopes across scenarios (correlation 0.11). Median test slopes were higher with higher true cstatistic, higher minimum node size, and higher sample size. Random forests learn local probability peaks that often yield near perfect training cstatistics without strongly affecting cstatistics on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.
 [2] arXiv:2402.18697 [pdf, other]

Title: Inferring Dynamic Networks from Marginals with Iterative Proportional FittingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Optimization and Control (math.OC); Statistics Theory (math.ST)
A common network inference problem, arising from realworld data constraints, is how to infer a dynamic network from its timeaggregated adjacency matrix and timevarying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structuredependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and realworld data, which demonstrate the practical value of our theoretical and algorithmic contributions.
 [3] arXiv:2402.18741 [pdf, other]

Title: Spectral Extraction of Unique Latent VariablesSubjects: Methodology (stat.ME)
Multimodal datasets contain observations generated by multiple types of sensors. Most works to date focus on uncovering latent structures in the data that appear in all modalities. However, important aspects of the data may appear in only one modality due to the differences between the sensors. Uncovering modalityspecific attributes may provide insights into the sources of the variability of the data. For example, certain clusters may appear in the analysis of genetics but not in epigenetic markers. Another example is hyperspectral satellite imaging, where various atmospheric and ground phenomena are detectable using different parts of the spectrum. In this paper, we address the problem of uncovering latent structures that are unique to a single modality. Our approach is based on computing a graph representation of datasets from two modalities and analyzing the differences between their connectivity patterns. We provide an asymptotic analysis of the convergence of our approach based on a product manifold model. To evaluate the performance of our method, we test its ability to uncover latent structures in multiple types of artificial and real datasets.
 [4] arXiv:2402.18745 [pdf, other]

Title: Degreeheterogeneous Latent Class Analysis for Highdimensional Discrete DataSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
The latent class model is a widely used mixture model for multivariate discrete data. Besides the existence of qualitatively heterogeneous latent classes, real data often exhibit additional quantitative heterogeneity nested within each latent class. The modern latent class analysis also faces extra challenges, including the highdimensionality, sparsity, and heteroskedastic noise inherent in discrete data. Motivated by these phenomena, we introduce the Degreeheterogeneous Latent Class Model and propose a spectral approach to clustering and statistical inference in the challenging highdimensional sparse data regime. We propose an easytoimplement HeteroClustering algorithm. It uses heteroskedastic PCA with L2 normalization to remove degree effects and perform clustering in the top singular subspace of the data matrix. We establish an exponential error rate for HeteroClustering, leading to exact clustering under minimal signaltonoise conditions. We further investigate the estimation and inference of the highdimensional continuous item parameters in the model, which are crucial to interpreting and finding useful markers for latent classes. We provide comprehensive procedures for global testing and multiple testing of these parameters with valid error controls. The superior performance of our methods is demonstrated through extensive simulations and applications to three diverse realworld datasets from political voting records, genetic variations, and singlecell sequencing.
 [5] arXiv:2402.18748 [pdf, other]

Title: Fast Bootstrapping Nonparametric Maximum Likelihood for Latent Mixture ModelsComments: 6 pages (main article is 4 pages, one page of references, and one page Appendix). 5 figures and 4 tables. This paper supersedes a previously circulated technical report by S. Wang and M. Shin (arXiv:2006.00767v2.pdf)Subjects: Methodology (stat.ME)
Estimating the mixing density of a latent mixture model is an important task in signal processing. Nonparametric maximum likelihood estimation is one popular approach to this problem. If the latent variable distribution is assumed to be continuous, then bootstrapping can be used to approximate it. However, traditional bootstrapping requires repeated evaluations on resampled data and is not scalable. In this letter, we construct a generative process to rapidly produce nonparametric maximum likelihood bootstrap estimates. Our method requires only a single evaluation of a novel twostage optimization algorithm. Simulations and real data analyses demonstrate that our procedure accurately estimates the mixing density with little computational cost even when there are a hundred thousand observations.
 [6] arXiv:2402.18810 [pdf, ps, other]

Title: The numeraire evariableSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider testing a composite null hypothesis $\mathcal{P}$ against a point alternative $\mathbb{Q}$. This paper establishes a powerful and general result: under no conditions whatsoever on $\mathcal{P}$ or $\mathbb{Q}$, we show that there exists a special evariable $X^*$ that we call the numeraire. It is strictly positive and for every $\mathbb{P} \in \mathcal{P}$, $\mathbb{E}_\mathbb{P}[X^*] \le 1$ (the evariable property), while for every other evariable $X$, we have $\mathbb{E}_\mathbb{Q}[X/X^*] \le 1$ (the numeraire property). In particular, this implies $\mathbb{E}_\mathbb{Q}[\log(X/X^*)] \le 0$ (logoptimality). $X^*$ also identifies a particular subprobability measure $\mathbb{P}^*$ via the density $d \mathbb{P}^*/d \mathbb{Q} = 1/X^*$. As a result, $X^*$ can be seen as a generalized likelihood ratio of $\mathbb{Q}$ against $\mathcal{P}$. We show that $\mathbb{P}^*$ coincides with the reverse information projection (RIPr) when additional assumptions are made that are required for the latter to exist. Thus $\mathbb{P}^*$ is a natural definition of the RIPr in the absence of any assumptions on $\mathcal{P}$ or $\mathbb{Q}$. In addition to the abstract theory, we provide several tools for finding the numeraire in concrete cases. We discuss several nonparametric examples where we can indeed identify the numeraire, despite not having a reference measure. We end with a more general optimality theory that goes beyond the ubiquitous logarithmic utility. We focus on certain power utilities, leading to reverse R\'enyi projections in place of the RIPr, which also always exists.
 [7] arXiv:2402.18900 [pdf, ps, other]

Title: Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled TrialsComments: 27 pages, 1 figure, 9 tablesSubjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
Randomized controlled trials (RCTs) with binary primary endpoints introduce novel challenges for inferring the causal effects of treatments. The most significant challenge is noncollapsibility, in which the conditional odds ratio estimand under covariate adjustment differs from the unconditional estimand in the logistic regression analysis of RCT data. This issue gives rise to apparent paradoxes, such as the variance of the estimator for the conditional odds ratio from a covariateadjusted model being greater than the variance of the estimator from the unadjusted model. We address this challenge in the context of adjustment based on predictions of control outcomes from generative artificial intelligence (AI) algorithms, which are referred to as prognostic scores. We demonstrate that prognostic score adjustment in logistic regression increases the power of the Wald test for the conditional odds ratio under a fixed sample size, or alternatively reduces the necessary sample size to achieve a desired power, compared to the unadjusted analysis. We derive formulae for prospective calculations of the power gain and sample size reduction that can result from adjustment for the prognostic score. Furthermore, we utilize gcomputation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands. We demonstrate the validity of our formulae via extensive simulation studies that encompass different types of logistic regression model specifications. Our simulation studies also indicate how prognostic score adjustment can reduce the variance of gcomputation estimators for the marginal estimands while maintaining frequentist properties such as asymptotic unbiasedness and Type I error rate control. Our methodology can ultimately enable more definitive and conclusive analyses for RCTs with binary primary endpoints.
 [8] arXiv:2402.18904 [pdf, other]

Title: False Discovery Rate Control for Confounder Selection Using Mirror StatisticsSubjects: Methodology (stat.ME)
While datadriven confounder selection requires careful consideration, it is frequently employed in observational studies to adjust for confounding factors. Widely recognized criteria for confounder selection include the minimal set approach, which involves selecting variables relevant to both treatment and outcome, and the union set approach, which involves selecting variables for either treatment or outcome. These approaches are often implemented using heuristics and offtheshelf statistical methods, where the degree of uncertainty may not be clear. In this paper, we focus on the false discovery rate (FDR) to measure uncertainty in confounder selection. We define the FDR specific to confounder selection and propose methods based on the mirror statistic, a recently developed approach for FDR control that does not rely on pvalues. The proposed methods are free from pvalues and require only the assumption of some symmetry in the distribution of the mirror statistic. It can be easily combined with sparse estimation and other methods that involve difficulties in deriving pvalues. The properties of the proposed method are investigated by exhaustive numerical experiments. Particularly in highdimensional data scenarios, our method outperforms conventional methods.
 [9] arXiv:2402.18921 [pdf, other]

Title: SemiSupervised UstatisticsSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Semisupervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or timeconsuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semisupervised Ustatistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical Ustatistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semisupervised settings and showcase that our procedure is semiparametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical Ustatistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
 [10] arXiv:2402.19021 [pdf, other]

Title: Enhancing the Power of Gaussian Graphical Model Inference by Modeling the Graph StructureSubjects: Methodology (stat.ME)
For the problem of inferring a Gaussian graphical model (GGM), this work explores the application of a recent approach from the multiple testing literature for graph inference. The main idea of the method by Rebafka et al. (2022) is to model the data by a latent variable model, the socalled noisy stochastic block model (NSBM), and then use the associated ${\ell}$values to infer the graph. The inferred graph controls the false discovery rate, that means that the proportion of falsely declared edges does not exceed a userdefined nominal level. Here it is shown that any test statistic from the GGM literature can be used as input for the NSBM approach to perform GGM inference. To make the approach feasible in practice, a new, computationally efficient inference algorithm for the NSBM is developed relying on a greedy approach to maximize the integrated completedata likelihood. Then an extensive numerical study illustrates that the NSBM approach outperforms the state of the art for any of the here considered GGMtest statistics. In particular in sparse settings and on real datasets a significant gain in power is observed.
 [11] arXiv:2402.19029 [pdf, ps, other]

Title: Essential Properties of Type III* MethodsAuthors: Lynn Roy LaMotteSubjects: Methodology (stat.ME)
Type III methods, introduced by SAS in 1976, formulate estimable functions that substitute, somehow, for classical ANOVA effects in multiple linear regression models. They have been controversial since, provoking wide use and satisfied users on the one hand and skepticism and scorn on the other. Their essential mathematical properties have not been established, although they are widely thought to be known: what those functions are, to what extent they coincide with classical ANOVA effects, and how they are affected by cell sample sizes, empty cells, and covariates. Those properties are established here.
 [12] arXiv:2402.19036 [pdf, other]

Title: Empirical Bayes in Bayesian learning: understanding a common practiceSubjects: Statistics Theory (math.ST)
In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a socalled empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a casebycase basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and nonidentifiable models  specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernsteinvon Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.
 [13] arXiv:2402.19046 [pdf, other]

Title: On the Improvement of Predictive Modeling Using Bayesian Stacking and Posterior Predictive CheckingComments: 40 pages including abstract and references (23 pages without), 3 figuresSubjects: Methodology (stat.ME)
Model uncertainty is pervasive in real world analysis situations and is an oftenneglected issue in applied statistics. However, standard approaches to the research process do not address the inherent uncertainty in model building and, thus, can lead to overconfident and misleading analysis interpretations. One strategy to incorporate more flexible models is to base inferences on predictive modeling. This approach provides an alternative to existing explanatory models, as inference is focused on the posterior predictive distribution of the response variable. Predictive modeling can advance explanatory ambitions in the social sciences and in addition enrich the understanding of social phenomena under investigation. Bayesian stacking is a methodological approach rooted in Bayesian predictive modeling. In this paper, we outline the method of Bayesian stacking but add to it the approach of posterior predictive checking (PPC) as a means of assessing the predictive quality of those elements of the stacking ensemble that are important to the research question. Thus, we introduce a viable workflow for incorporating PPC into predictive modeling using Bayesian stacking without presuming the existence of a true model. We apply these tools to the PISA 2018 data to investigate potential inequalities in reading competency with respect to gender and socioeconomic background. Our empirical example serves as rough guideline for practitioners who want to implement the concepts of predictive modeling and model uncertainty in their work to similar research questions.
 [14] arXiv:2402.19109 [pdf, other]

Title: Confidence and Assurance of PercentilesAuthors: Sanjay M. JoshiComments: 5 pages, 4 FiguresSubjects: Methodology (stat.ME); Information Theory (cs.IT)
Confidence interval of mean is often used when quoting statistics. The same rigor is often missing when quoting percentiles and tolerance or percentile intervals. This article derives the expression for confidence in percentiles of a sample population. Confidence intervals of median is compared to those of mean for a few sample distributions. The concept of assurance from reliability engineering is then extended to percentiles. The assurance level of sorted samples simply matches the confidence and percentile levels. Numerical method to compute assurance using Brent's optimization method is provided as an opensource python package.
 [15] arXiv:2402.19162 [pdf, other]

Title: A Bayesian approach to uncover spatiotemporal determinants of heterogeneity in repeated crosssectional health surveysSubjects: Applications (stat.AP); Methodology (stat.ME); Other Statistics (stat.OT)
In several countries, including Italy, a prominent approach to population health surveillance involves conducting repeated crosssectional surveys at short intervals of time. These surveys gather information on the health status of individual respondents, including details on their behaviors, risk factors, and relevant sociodemographic information. While the collected data undoubtedly provides valuable information, modeling such data presents several challenges. For instance, in health risk models, it is essential to consider behavioral information, spatiotemporal dynamics, and disease cooccurrence. In response to these challenges, our work proposes a multivariate spatiotemporal logistic model for chronic disease diagnoses. Predictors are modeled using individual risk factor covariates and a latent individual propensity to the disease.
Leveraging a state space formulation of the model, we construct a framework in which spatiotemporal heterogeneity in regression parameters is informed by exogenous spatial information, corresponding to different spatial contextual risk factors that may affect health and the occurrence of chronic diseases in different ways. To explore the utility and the effectiveness of our method, we analyze behavioral and risk factor surveillance data collected in Italy (PASSI), which is wellknown as a country characterized by high peculiar administrative, social and territorial diversities reflected on high variability in morbidity among population subgroups.  [16] arXiv:2402.19209 [pdf, other]

Title: Call center data analysis and model validationSubjects: Applications (stat.AP)
We analyze call center data on properties such as agent heterogeneity, customer patience and breaks. Then we compare simulation models that are different in the ways these properties are modeled. We classify them according to the extend in which they approach the actual service level and average waiting times. We obtain a theoretical understanding on how to distinguish between the model error and other aspects such as random noise. We conclude that modeling explicitly breaks and agent heterogeneity is crucial for obtaining a precise model.
 [17] arXiv:2402.19214 [pdf, other]

Title: A Bayesian approach with Gaussian priors to the inverse problem of source identification in elliptic PDEsAuthors: Matteo GiordanoComments: 16 Pages. The reproducible code is available at: this https URLSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posteriorbased estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the DirichletLaplacian eigenbasis and Mat\'ern process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study.
 [18] arXiv:2402.19268 [pdf, ps, other]

Title: Extremal quantiles of intermediate orders under twoway clusteringSubjects: Statistics Theory (math.ST); Econometrics (econ.EM)
This paper investigates extremal quantiles under twoway cluster dependence. We demonstrate that the limiting distribution of the unconditional intermediate order quantiles in the tails converges to a Gaussian distribution. This is remarkable as twoway cluster dependence entails potential nonGaussianity in general, but extremal quantiles do not suffer from this issue. Building upon this result, we extend our analysis to extremal quantile regressions of intermediate order.
 [19] arXiv:2402.19346 [pdf, ps, other]

Title: Recanting witness and natural direct effects: Violations of assumptions or definitions?Authors: Ian ShrierComments: 5 pages, 1 figureSubjects: Methodology (stat.ME)
There have been numerous publications on the advantages and disadvantages of estimating natural (pure) effects compared to controlled effects. One of the main criticisms of natural effects is that it requires an additional assumption for identifiability, namely that the exposure does not cause a confounder of the mediatoroutcome relationship. However, every analysis in every study should begin with a research question expressed in ordinary language. Researchers then develop/use mathematical expressions or estimators to best answer these ordinary language questions. When a recanting witness is present, the paper illustrates that there are no violations of assumptions. Rather, using directed acyclic graphs, the typical estimators for natural effects are simply no longer answering any meaningful question. Although some might view this as semantics, the proposed approach illustrates why the more recent methods of pathspecific effects and separable effects are more valid and transparent compared to previous methods for decomposition analysis.
 [20] arXiv:2402.19455 [pdf, other]

Title: Listening to the Noise: Blind Denoising with Gibbs DiffusionComments: 12+8 pages, 7+3 figures, 1+1 tables, code: this https URLSubjects: Machine Learning (stat.ML); Cosmology and Nongalactic Astrophysics (astroph.CO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusionbased posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the family of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and spectral index, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe.
Crosslists for Fri, 1 Mar 24
 [21] arXiv:2402.18579 (crosslist from cs.CV) [pdf, ps, other]

Title: Wilcoxon Nonparametric CFAR Scheme for Ship Detection in SAR ImageAuthors: Xiangwei MengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Applications (stat.AP)
The parametric constant false alarm rate (CFAR) detection algorithms which are based on various statistical distributions, such as Gaussian, Gamma, Weibull, lognormal, G0 distribution, alphastable distribution, etc, are most widely used to detect the ship targets in SAR image at present. However, the clutter background in SAR images is complicated and variable. When the actual clutter background deviates from the assumed statistical distribution, the performance of the parametric CFAR detector will deteriorate. In addition to the parametric CFAR schemes, there is another class of nonparametric CFAR detectors which can maintain a constant false alarm rate for the target detection without the assumption of a known clutter distribution. In this work, the Wilcoxon nonparametric CFAR scheme for ship detection in SAR image is proposed and analyzed, and a closed form of the false alarm rate for the Wilcoxon nonparametric detector to determine the decision threshold is presented. By comparison with several typical parametric CFAR schemes on Radarsat2, ICEYEX6 and Gaofen3 SAR images, the robustness of the Wilcoxon nonparametric detector to maintain a good false alarm performance in different detection backgrounds is revealed, and its detection performance for the weak ship in rough sea surface is improved to some extent. Moreover, the Wilcoxon nonparametric detector can suppress the false alarms resulting from the sidelobes at some degree and its detection speed is fast.
 [22] arXiv:2402.18591 (crosslist from cs.LG) [pdf, ps, other]

Title: Stochastic contextual bandits with graph feedback: from independence number to MAS numberSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Statistics Theory (math.ST)
We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multiarmed bandits setting where a growing literature has painted a nearcomplete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graphtheoretical quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve nearoptimal regrets for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number completely characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multiarmed bandits.
 [23] arXiv:2402.18651 (crosslist from cs.LG) [pdf, other]

Title: Quantifying Human Priors over Social and Navigation NetworksAuthors: Gecia BravoHermsdorffComments: Published on Proceedings of the 40th International Conference on Machine Learning (ICML), PMLR 202:30633105, 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Physics and Society (physics.socph); Neurons and Cognition (qbio.NC); Methodology (stat.ME)
Human knowledge is largely implicit and relational  do we have a friend in common? can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data. Our experiments focus on two domains that have been continuously relevant over evolutionary timescales: social interaction and spatial navigation. We find that some features of the inferred priors are remarkably consistent, such as the tendency for sparsity as a function of graph size. Other features are domainspecific, such as the propensity for triadic closure in social interactions. More broadly, our work demonstrates how nonclassical statistical analysis of indirect behavioral experiments can be used to efficiently model latent biases in the data.
 [24] arXiv:2402.18666 (crosslist from math.OC) [pdf, other]

Title: Linear shrinkage for optimization in high dimensionsSubjects: Optimization and Control (math.OC); Statistics Theory (math.ST)
In largescale, datadriven applications, parameters are often only known approximately due to noise and limited data samples. In this paper, we focus on highdimensional optimization problems with linear constraints under uncertain conditions. To find high quality solutions for which the violation of the true constraints is limited, we develop a linear shrinkage method that blends random matrix theory and robust optimization principles. It aims to minimize the Frobenius distance between the estimated and the true parameter matrix, especially when dealing with a large and comparable number of constraints and variables. This datadriven method excels in simulations, showing superior noise resilience and more stable performance in both obtaining high quality solutions and adhering to the true constraints compared to traditional robust optimization. Our findings highlight the effectiveness of our method in improving the robustness and reliability of optimization in highdimensional, datadriven scenarios.
 [25] arXiv:2402.18689 (crosslist from cs.LG) [pdf, other]

Title: The VOROS: Lifting ROC curves to 3DComments: 38 pages, 19 figuresSubjects: Machine Learning (cs.LG); Metric Geometry (math.MG); Statistics Theory (math.ST); Methodology (stat.ME)
The area under the ROC curve is a common measure that is often used to rank the relative performance of different binary classifiers. However, as has been also previously noted, it can be a measure that illcaptures the benefits of different classifiers when either the true class values or misclassification costs are highly unbalanced between the two classes. We introduce a third dimension to capture these costs, and lift the ROC curve to a ROC surface in a natural way. We study both this surface and introduce the VOROS, the volume over this ROC surface, as a 3D generalization of the 2D area under the ROC curve. For problems where there are only bounds on the expected costs or class imbalances, we restrict consideration to the volume of the appropriate subregion of the ROC surface. We show how the VOROS can better capture the costs of different classifiers on both a classical and a modern example dataset.
 [26] arXiv:2402.18724 (crosslist from cs.LG) [pdf, other]

Title: Learning Associative Memories with Gradient DescentSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic growth of the ``classification margins.'' Yet, we show that imbalance in token frequencies and memory interferences due to correlated embeddings lead to oscillatory transitory regimes. The oscillations are more pronounced with large step sizes, which can create benign loss spikes, although these learning rates speed up the dynamics and accelerate the asymptotic convergence. In underparameterized regimes, we illustrate how the crossentropy loss can lead to suboptimal memorization schemes. Finally, we assess the validity of our findings on small Transformer models.
 [27] arXiv:2402.18800 (crosslist from cs.LG) [pdf, other]

Title: BlockEcho: Retaining LongRange Dependencies for Imputing BlockWise Missing DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Blockwise missing data poses significant challenges in realworld data imputation tasks. Compared to scattered missing data, blockwise gaps exacerbate adverse effects on subsequent analytic and machine learning tasks, as the lack of local neighboring elements significantly reduces the interpolation capability and predictive power. However, this issue has not received adequate attention. Most SOTA matrix completion methods appeared less effective, primarily due to overreliance on neighboring elements for predictions. We systematically analyze the issue and propose a novel matrix completion method ``BlockEcho" for a more comprehensive solution. This method creatively integrates Matrix Factorization (MF) within Generative Adversarial Networks (GAN) to explicitly retain longdistance interelement relationships in the original matrix. Besides, we incorporate an additional discriminator for GAN, comparing the generator's intermediate progress with pretrained MF results to constrain highorder feature distributions. Subsequently, we evaluate BlockEcho on public datasets across three domains. Results demonstrate superior performance over both traditional and SOTA methods when imputing blockwise missing data, especially at higher missing rates. The advantage also holds for scattered missing data at high missing rates. We also contribute on the analyses in providing theoretical justification on the optimality and convergence of fusing MF and GAN for missing block data.
 [28] arXiv:2402.18805 (crosslist from cs.SI) [pdf, other]

Title: VECSBM: Optimal Community Detection with Vectorial Edges CovariatesSubjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Social networks are often associated with rich side information, such as texts and images. While numerous methods have been developed to identify communities from pairwise interactions, they usually ignore such side information. In this work, we study an extension of the Stochastic Block Model (SBM), a widely used statistical framework for community detection, that integrates vectorial edges covariates: the Vectorial Edges Covariates Stochastic Block Model (VECSBM). We propose a novel algorithm based on iterative refinement techniques and show that it optimally recovers the latent communities under the VECSBM. Furthermore, we rigorously assess the added value of leveraging edge's side information in the community detection process. We complement our theoretical results with numerical experiments on synthetic and semisynthetic data.
 [29] arXiv:2402.18851 (crosslist from cs.LG) [pdf, other]

Title: Applications of 01 Neural Networks in Prescription and PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
A key challenge in medical decision making is learning treatment policies for patients with limited observational data. This challenge is particularly evident in personalized healthcare decisionmaking, where models need to take into account the intricate relationships between patient characteristics, treatment options, and health outcomes. To address this, we introduce prescriptive networks (PNNs), shallow 01 neural networks trained with mixed integer programming that can be used with counterfactual estimation to optimize policies in medium data settings. These models offer greater interpretability than deep neural networks and can encode more complex policies than common models such as decision trees. We show that PNNs can outperform existing methods in both synthetic data experiments and in a case study of assigning treatments for postpartum hypertension. In particular, PNNs are shown to produce policies that could reduce peak blood pressure by 5.47 mm Hg (p=0.02) over existing clinical practice, and by 2 mm Hg (p=0.01) over the next best prescriptive modeling technique. Moreover PNNs were more likely than all other models to correctly identify clinically significant features while existing models relied on potentially dangerous features such as patient insurance information and race that could lead to bias in treatment.
 [30] arXiv:2402.18884 (crosslist from cs.LG) [pdf, ps, other]

Title: Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained FeaturesComments: 10 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent findings reveal that overparameterized deep neural networks, trained beyond zero trainingerror, exhibit a distinctive structural pattern at the final layer, termed as Neuralcollapse (NC). These results indicate that the final hiddenlayer outputs in such networks display minimal withinclass variations over the training set. While existing research extensively investigates this phenomenon under crossentropy loss, there are fewer studies focusing on its contrastive counterpart, supervised contrastive (SC) loss. Through the lens of NC, this paper employs an analytical approach to study the solutions derived from optimizing the SC loss. We adopt the unconstrained features model (UFM) as a representative proxy for unveiling NCrelated phenomena in sufficiently overparameterized deep networks. We show that, despite the nonconvexity of SC loss minimization, all local minima are global minima. Furthermore, the minimizer is unique (up to a rotation). We prove our results by formalizing a tight convex relaxation of the UFM. Finally, through this convex formulation, we delve deeper into characterizing the properties of global solutions under labelimbalanced training data.
 [31] arXiv:2402.18910 (crosslist from cs.LG) [pdf, other]

Title: DIGIC: Domain Generalizable Imitation Learning by Causal DiscoverySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
Causality has been combined with machine learning to produce robust representations for domain generalization. Most existing methods of this type require massive data from multiple domains to identify causal features by crossdomain variations, which can be expensive or even infeasible and may lead to misidentification in some cases. In this work, we make a different attempt by leveraging the demonstration data distribution to discover the causal features for a domain generalizable policy. We design a novel framework, called DIGIC, to identify the causal features by finding the direct cause of the expert action from the demonstration data distribution via causal discovery. Our framework can achieve domain generalizable imitation learning with only singledomain data and serve as a complement for crossdomain variationbased methods under nonstructural assumptions on the underlying causal models. Our empirical study in various control tasks shows that the proposed framework evidently improves the domain generalization performance and has comparable performance to the expert in the original domain simultaneously.
 [32] arXiv:2402.18995 (crosslist from cs.LG) [pdf, other]

Title: NegativeBinomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time SeriesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Modeling countvalued time series has been receiving increasing attention since count time series naturally arise in physical and social domains. Poisson gamma dynamical systems (PGDSs) are newlydeveloped methods, which can well capture the expressive latent transition structure and bursty dynamics behind count sequences. In particular, PGDSs demonstrate superior performance in terms of data imputation and prediction, compared with canonical linear dynamical system (LDS) based methods. Despite these advantages, PGDS cannot capture the heterogeneous overdispersed behaviours of the underlying dynamic processes. To mitigate this defect, we propose a negativebinomialrandomized gamma Markov process, which not only significantly improves the predictive performance of the proposed dynamical system, but also facilitates the fast convergence of the inference algorithm. Moreover, we develop methods to estimate both factorstructured and graphstructured transition dynamics, which enable us to infer more explainable latent structure, compared with PGDSs. Finally, we demonstrate the explainable latent structure learned by the proposed method, and show its superior performance in imputing missing data and forecasting future observations, compared with the related models.
 [33] arXiv:2402.19442 (crosslist from cs.LG) [pdf, other]

Title: Training Dynamics of MultiHead Softmax Attention for InContext Learning: Emergence, Convergence, and OptimalityComments: 141 pages, 7 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the dynamics of gradient flow for training a multihead softmax attention model for incontext learning of multitask linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multitask model. Specifically, we prove that the gradient flow dynamics can be split into three phases  a warmup phase where the loss decreases rather slowly and the attention heads gradually build up their inclination towards individual tasks, an emergence phase where each head selects a single task and the loss rapidly decreases, and a convergence phase where the attention parameters converge to a limit. Furthermore, we prove the optimality of gradient flow in the sense that the limiting model learned by gradient flow is on par with the best possible multihead softmax attention model up to a constant factor. Our analysis also delineates a strict separation in terms of the prediction accuracy of ICL between singlehead and multihead attention models. The key technique for our convergence analysis is to map the gradient flow dynamics in the parameter space to a set of ordinary differential equations in the spectral domain, where the relative magnitudes of the semisingular values of the attention weights determines task allocation. To our best knowledge, our work provides the first convergence result for the multihead softmax attention model.
 [34] arXiv:2402.19449 (crosslist from cs.LG) [pdf, other]

Title: HeavyTailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language ModelsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
Adam has been shown to outperform gradient descent in optimizing large language transformers empirically, and by a larger margin than on other tasks, but it is unclear why this happens. We show that the heavytailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics. When training with gradient descent, the loss associated with infrequent words decreases slower than the loss associated with frequent ones. As most samples come from relatively infrequent words, the average loss decreases slowly with gradient descent. On the other hand, Adam and signbased methods do not suffer from this problem and improve predictions on all classes. To establish that this behavior is indeed caused by class imbalance, we show empirically that it persist through different architectures and data types, on language transformers, vision CNNs, and linear models. We further study this phenomenon on a linear classification with crossentropy loss, showing that heavytailed class imbalance leads to illconditioning, and that the normalization used by Adam can counteract it.
 [35] arXiv:2402.19456 (crosslist from quantph) [pdf, other]

Title: Statistical Estimation in the Spiked Tensor Model via the Quantum Approximate Optimization AlgorithmComments: 51 pages, 4 figures, 1 tableSubjects: Quantum Physics (quantph); Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)
The quantum approximate optimization algorithm (QAOA) is a generalpurpose algorithm for combinatorial optimization. In this paper, we analyze the performance of the QAOA on a statistical estimation problem, namely, the spiked tensor model, which exhibits a statisticalcomputational gap classically. We prove that the weak recovery threshold of $1$step QAOA matches that of $1$step tensor power iteration. Additional heuristic calculations suggest that the weak recovery threshold of $p$step QAOA matches that of $p$step tensor power iteration when $p$ is a fixed constant. This further implies that multistep QAOA with tensor unfolding could achieve, but not surpass, the classical computation threshold $\Theta(n^{(q2)/4})$ for spiked $q$tensors.
Meanwhile, we characterize the asymptotic overlap distribution for $p$step QAOA, finding an intriguing sineGaussian law verified through simulations. For some $p$ and $q$, the QAOA attains an overlap that is larger by a constant factor than the tensor power iteration overlap. Of independent interest, our proof techniques employ the Fourier transform to handle difficult combinatorial sums, a novel approach differing from prior QAOA analyses on spinglass models without planted structure.  [36] arXiv:2402.19460 (crosslist from cs.LG) [pdf, other]

Title: Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized TasksComments: 43 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, outofdistribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one task. Hence, there is a plethora of recent advances with different intentions  that often entirely deviate from practical behavior. This paper conducts a comprehensive evaluation of numerous uncertainty estimators across diverse tasks on ImageNet. We find that, despite promising theoretical endeavors, disentanglement is not yet achieved in practice. Additionally, we reveal which uncertainty estimators excel at which specific tasks, providing insights for practitioners and guiding future research toward taskcentric and disentangled uncertainty estimation methods. Our code is available at https://github.com/bmucsanyi/bud.
Replacements for Fri, 1 Mar 24
 [37] arXiv:1701.07078 (replaced) [pdf, ps, other]

Title: MeasurementtoTrack Association and FiniteSet StatisticsAuthors: Ronald MahlerComments: 7 pages, no figuresSubjects: Methodology (stat.ME)
 [38] arXiv:1806.05451 (replaced) [pdf, other]

Title: The committee machine: Computational to statistical gaps in learning a twolayers neural networkAuthors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka ZdeborováComments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)Journalref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Statistical Mechanics (condmat.statmech); Computational Physics (physics.compph); Machine Learning (stat.ML)
 [39] arXiv:1903.11198 (replaced) [pdf, other]

Title: Parallel Experimentation and Competitive Interference on Online Advertising PlatformsSubjects: General Economics (econ.GN); Applications (stat.AP)
 [40] arXiv:2006.10628 (replaced) [pdf, other]

Title: Offline detection of changepoints in the mean for stationary graph signalsComments: 16 pages, 2 figures, 1 table, 1 annex. 9 pages of main textSubjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
 [41] arXiv:2203.01360 (replaced) [pdf, other]

Title: Neural Galerkin Schemes with Active Learning for HighDimensional Evolution EquationsJournalref: Journal of Computational Physics, Volume 496, 2024Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [42] arXiv:2204.07672 (replaced) [pdf, other]

Title: Abadie's Kappa and Weighting Estimators of the Local Average Treatment EffectSubjects: Econometrics (econ.EM); Methodology (stat.ME)
 [43] arXiv:2210.14054 (replaced) [pdf, ps, other]

Title: ReducedDimension Surrogate Modeling to Characterize the Damage Tolerance of Composite/Metal StructuresComments: 32 pages, 15 figures, 12 tablesJournalref: Modelling 2023, 4, 485514Subjects: Applications (stat.AP)
 [44] arXiv:2210.14484 (replaced) [pdf, other]

Title: Imputation of missing values in multiview dataAuthors: Wouter van Loon, Marjolein Fokkema, Frank de Vos, Marisa Koini, Reinhold Schmidt, Mark de RooijComments: 48 pages, 15 figures. Major revisionsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [45] arXiv:2212.06669 (replaced) [pdf, ps, other]

Title: A scale of interpretation for likelihood ratios and Bayes factorsAuthors: Frank DudbridgeJournalref: PLoS ONE 19(2): e0297874 (2024)Subjects: Methodology (stat.ME)
 [46] arXiv:2301.06297 (replaced) [pdf, other]

Title: Inference via robust optimal transportation: theory and methodsSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
 [47] arXiv:2303.12407 (replaced) [pdf, ps, other]

Title: Nonasymptotic analysis of Langevintype Monte Carlo algorithmsAuthors: Shogo NakakitaSubjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
 [48] arXiv:2305.01849 (replaced) [pdf, other]

Title: Semiparametric Discovery and Estimation of Interaction in Mixed Exposures using Stochastic InterventionsSubjects: Methodology (stat.ME)
 [49] arXiv:2305.04634 (replaced) [pdf, other]

Title: Neural Likelihood Surfaces for Spatial Processes with Computationally Intensive or Intractable LikelihoodsComments: 65 pages, 20 figuresSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [50] arXiv:2305.15991 (replaced) [pdf, ps, other]

Title: Finite sample rates for logistic regression with small noise or few samplesSubjects: Statistics Theory (math.ST)
 [51] arXiv:2306.10405 (replaced) [pdf, other]

Title: A semiparametric estimation method for quantile coherence with an application to bivariate financial time series clusteringComments: 39 pages, 11 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
 [52] arXiv:2306.15012 (replaced) [pdf, other]

Title: Statistical Component Separation for Targeted Signal Recovery in Noisy MixturesComments: 13+17 pages, 6+8 figures, published in TMLR, code: this https URLSubjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG); Signal Processing (eess.SP)
 [53] arXiv:2309.12924 (replaced) [pdf, other]

Title: Automated grading workflows for providing personalized feedback to openended data science assignmentsComments: 24 pages, 3 figuresSubjects: Physics Education (physics.edph); Computers and Society (cs.CY); Other Statistics (stat.OT)
 [54] arXiv:2309.16598 (replaced) [pdf, other]

Title: CrossPredictionPowered InferenceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [55] arXiv:2310.01236 (replaced) [pdf, other]

Title: Mirror Diffusion Models for Constrained and Watermarked GenerationComments: submitted to NeurIPS on 5/18 but did not arxiv per NeurIPS policy, accepted on 9/22Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [56] arXiv:2310.11143 (replaced) [pdf, ps, other]

Title: Exploring a new machine learning based probabilistic model for highresolution indoor radon mapping, using the German indoor radon survey dataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan)
 [57] arXiv:2310.14720 (replaced) [pdf, other]

Title: Extended Deep Adaptive Input Normalization for Preprocessing Time Series Data for Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [58] arXiv:2310.17273 (replaced) [pdf, other]

Title: Looping in the Human Collaborative and Explainable Bayesian OptimizationAuthors: Masaki Adachi, Brady Planden, David A. Howey, Michael A. Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, Siu Lun ChauComments: Accepted at AISTATS 2024, 24 pages, 11 figuresSubjects: Machine Learning (cs.LG); HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
 [59] arXiv:2311.08168 (replaced) [pdf, other]

Title: TimeUniform Confidence Spheres for Means of Random VectorsComments: 46 pages, 1 figureSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME); Machine Learning (stat.ML)
 [60] arXiv:2312.02959 (replaced) [pdf, other]

Title: Detecting algorithmic bias in medical AImodelsComments: 26 pages, 9 figuresSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
 [61] arXiv:2402.03726 (replaced) [pdf, other]

Title: Learning Granger Causality from Instancewise Selfattentive Hawkes ProcessesAuthors: Dongxia Wu, Tsuyoshi Idé, Aurélie Lozano, Georgios Kollias, Jiří Navrátil, Naoki Abe, YiAn Ma, Rose YuSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [62] arXiv:2402.16326 (replaced) [pdf, other]

Title: A Provably Accurate Randomized Sampling Algorithm for Logistic RegressionComments: To appear in the proceedings of AAAI 2024Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
 [63] arXiv:2402.17886 (replaced) [pdf, other]

Title: ZerothOrder Sampling Methods for NonLogConcave Distributions: Alleviating Metastability by Denoising DiffusionComments: Figure 4 on page 13 corrected. Comments are welcomeSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)
 [64] arXiv:2402.18510 (replaced) [pdf, other]

Title: RNNs are not Transformers (Yet): The Key Bottleneck on Incontext RetrievalComments: 42 pages, 5 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [65] arXiv:2402.18571 (replaced) [pdf, other]

Title: Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with MultiObjective RewardsAuthors: Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong ZhangComments: The code and model are released at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2403, contact, help (Access key information)