Data Science and Dependence 2023 Conference |
Schedule
Sunday, July 9
Venue: IWH.19:00 - 21:00 | Get-Together at IWH (with some finger food) |
Monday, July 10
Venue: IWH.9:20 | Welcome by the organizers |
9:30 - 10:00 | Richard Samworth (Cambridge)
Isotonic subgroup selection
Details
Abstract: Given a sample of covariate-response pairs, we consider the subgroup selection problem of identifying a subset of the covariate domain where the regression function exceeds a pre-determined threshold. We introduce a computationally-feasible approach for subgroup selection in the context of multivariate isotonic regression based on martingale tests and multiple testing procedures for logically-structured hypotheses. Our proposed procedure satisfies a non-asymptotic, uniform Type I error rate guarantee with power that attains the minimax optimal rate up to poly-logarithmic factors. Extensions cover classification, isotonic quantile regression and heterogeneous treatment effect settings. Numerical studies on both simulated and real data confirm the practical effectiveness of our proposal. Link to Paper |
10:00 - 10:30 | Nicole Mücke (Braunschweig)
Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem
Details
Abstract: We consider the problem of learning a linear operator θ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for θ with the undesirable feature that its forward operator is generally non-compact (even if θ is assumed to be compact or of p-Schatten class). However, we prove that, in terms of spectral properties and regularisation theory, this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression. Our framework allows for the elegant derivation of dimension-free rates for generic learning algorithms under Hölder-type source conditions. The proofs rely on the combination of techniques from kernel regression with recent results on concentration of measure for sub-exponential Hilbertian random variables. The obtained rates hold for a variety of practically-relevant scenarios in functional regression as well as nonlinear regression with operator-valued kernels and match those of classical kernel regression with scalar response. This is a joint work with Mattes Mollenhauer (FU Berlin) and J.T. Sullivan (University of Warwick). Link to Paper |
Coffee | |
11:00 - 11:30 | Nicolas Verzelen (INRAE Montpellier)
Optimal Permutation Estimation in Crowd-Sourcing problems
Details
Abstract: Motivated by crowd-sourcing applications, we consider a model where we have partial observations from a bivariate isotonic n×d matrix with an unknown permutation π* acting on its rows. Focusing on the twin problems of recovering the permuta- tion π* and estimating the unknown matrix, we introduce a polynomial-time procedure achieving the minimax risk for these two problems, this for all possible values of n, d, and all possible sampling efforts. Along the way, we establish that, in some regimes, recovering the unknown permutation π* is considerably simpler than estimating the matrix. Link to Paper |
11:30 - 12:00 | Sofia Olhede (Lausanne)
On Graph Limits as Models for Interaction Data
Details
Abstract: Network data has become a staple in many different applications, ranging from ecology, to neuroscience and systems biology. Its inference will of course depend on the application where we collect the network data, but I will discuss some general principles based on probabilistic symmetries such as permutation invariance. Just like other probabilistic invariances, the distributional invariance to permuting indices of a matrix of interactions implies a representation theorem (the Aldous-Hoover theorem). This representation is in terms of a graph limit function, or graphon. I will discuss the representation, how to make inferences based on this representation, what to do if distributional permutation invariance does not hold, and what to do if we have additional information such as time stamp of interactions, multiple interactions or additional covariate data. |
12:00 - 12:30 | Alexander Kreiß (Leipzig) Testing For Global Covariate Effects in Dynamic Interaction Event Networks Details Abstract: In statistical network analysis it is common to observe so called interaction data. Such data is characterized by actors forming the vertices and interacting along edges of the network, where edges are randomly formed and dissolved over the observation horizon. In addition covariates are observed and the goal is to model the impact of the covariates on the interactions. We distinguish two types of covariates: global, system-wide covariates (i.e. covariates taking the same value for all individuals, such as seasonality) and local, dyadic covariates modeling interactions between two individuals in the network. Existing continuous time network models are extended to allow for comparing a completely parametric model and a model that is parametric only in the local covariates but has a global non-parametric time component. This allows, for instance, to test whether global time dynamics can be explained by simple global covariates like weather, seasonality etc. The procedure is applied to a bike-sharing network by using weather and weekdays as global covariates and distances between the bike stations as local covariates. Link to Paper |
12:45 | Lunch at IWH |
15:00 - 15:30 | Wei Biao Wu (Chicago) Change point analysis with irregular signals: When did the COVID-19 pandemic start? Details Abstract: I will discuss the problem of testing and estimation of change points where signals after the change point can be highly irregular. Our setting substantially differs from the existing literature that assumes signals to be piecewise constant or vary smoothly. A two-step approach is proposed to effectively estimate the location of the change point. The first step consists of a preliminary estimation of the change point that allows us to obtain unknown parameters in the second step. In a second step we use a new procedure to determine the position of the change point. We show that the optimal OP(1) rate of convergence of the estimated change point to the true change point can be obtained. We apply our method to analyze a health time series data and estimate 7 December 2019 as the starting date of the COVID-19 pandemic. This work is joint with Tobias Kley, Yuhan Philip Liu and Hongyuan Cao. |
15:30 - 16:00 | Moritz Jirak (Wien)
Weak dependence and optimal quantitative self-normalized central limit theorems
Details
Abstract: Consider a stationary, weakly dependent sequence of random variables. Given that a CLT holds, how should we estimate the long-run variance? This problem has been studied for decades, prominent proposed solutions were given for instance by Andrews (1991) or Newey and West (1994). Using the proximity of the corresponding normal distribution as quality measure, we discuss optimal solutions and why previous proposals are not optimal in this context. The setup contains many prominent dynamical systems and time series models, including random walks on the general linear group, products of positive random matrices, functionals of Garch models of any order, functionals of dynamical systems arising from SDEs, iterated random functions and many more. |
16:00 - 16:30 | Suhasini Subba Rao (Texas A & M)
Learning graphical models for nonstationary time series
Details
Abstract: We propose NonStGM, a general nonparametric graphical modeling framework for studying dynamic associations among the components of a nonstationary multivariate time series. It builds on the framework of Gaussian Graphical Models (GGM) and stationary time series Graphical models (StGM), and complements existing works on parametric graphical models based on change point vector autoregressions (VAR). Analogous to StGM, the proposed framework captures conditional noncorrelations (both intertemporal and contemporaneous) in the form of an undirected graph. In addition, to describe the more nuanced nonstationary relationships among the components of the time series, we introduce the new notion of conditional nonstationarity/stationarity and incorporate it within the graph. This can be used to search for small subnetworks that serve as the 'source' of nonstationarity in a large system. We explicitly connect conditional noncorrelation and stationarity between and within components of the multivariate time series to zero and Toeplitz embeddings of an infinite-dimensional inverse covariance operator. In the Fourier domain, conditional stationarity and noncorrelation relationships in the inverse covariance operator are encoded with a specific sparsity structure of its integral kernel operator. In the last part of the talk we describe how the graphical model can be estimated from the observed time series and some of the sampling properties of the proposed estimation scheme. |
Coffee | |
17:00 - 17:30 | George Michailidis (Florida U.)
Optimal Estimation of High-Dimensional Heavy Tailed Time series
Details
Abstract: High dimensional vector autoregressive models have attracted considerable attention due to their adoption in different application domains. In this presentation, we examine their regularized estimation for heavy tailed data. We establish optimal consistency rates and corresponding finite sample bounds for the underlying model parameters. Further, we also incorporate more general regularizers (that are not decomposable) to induce general sparsity patterns. The key technical tool employed is a novel, easy-to-use concentration bound for heavy tailed linear processes, that does not rely on "mixing" notions and gives tighter bounds. Extensions to other time series models are also briefly discussed. |
17:30 - 18:00 | Mathias Trabs (Karlsruhe)
Statistical guarantees for stochastic Metropolis-Hastings
Details
Abstract: Uncertainty estimation is a key issue when considering the application of deep neural network methods in science and engineering. To this end, numerous Bayesian neural networks approaches have been introduced where the main challenge is to construct an algorithm which on the one hand is applicable to the large sample sizes and parameter dimensions of modern applications and which on the other hand admits statistical guarantees. We introduce a version of a stochastic Metropolis-Hastings MCMC sampler which allows for large sample sizes due to the stochastic Metropolis-Hastings step and which can be applied to large parameter dimensions due to a directional noise in the proposal distribution. The statistical properties of the resulting Gibbs posterior distribution are studied in a nonparametric regression setting. We prove a PAC-Bayes oracle inequality which yields optimal contraction rates. Moreover, we analyze the diameter and show high coverage probability of the resulting credible sets. The talk is based on joint work with Sebastian Bieringer, Gregor Kasieczka and Maximilian F. Steffen. |
18:10 - 19:30 | Dinner at IWH |
19:45 - | Guided walk through the Heidelberg old town |
Tuesday, July 11
Venue: IWH.9:00 - 9:30 | Alexandra Carpentier (Potsdam)
Tight concentration inequalities for weakly dependent fields, and applications to the mixing bandit problem
Details
Abstract: We study the stochastic multi-armed bandit problem in the case when the arm samples are dependent over time and generated from so-called weak C-mixing processes. We establish a C-Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing scenario, we show that pseudo-regret enjoys the same upper bound (up to a factor) as for i.i.d. observations; whereas in the second, slow mixing scenario, we discover a surprising effect, that the regret upper bound is similar to the independent case, with an incremental additive term which does not depend on the number of arms. The analysis of slow mixing scenario is supported with a minmax lower bound, which (up to a log(T) factor) matches the obtained upper bound. Link to Paper |
9:30 - 10:00 | Steffen Dereich (Münster)
On the existence of optimal shallow networks
Details
Abstract: In this talk we discuss existence of global minima in optimisation problems over shallow neural networks. More explicitly, the function class over which we minimise is the family of all functions that can be expressed as artificial residual or feedforward neural networks with one hidden layer featuring a specified number of neurons with ReLU (or Leaky ReLU) activation. We give existence results. Moreover, we provide counterexamples that illustrate the relevance of the assumptions imposed in the theorems. The talk is based on joint work with Sebastian Kassing and Arnulf Jentzen. Link to Paper, Link to Paper 2 |
10:00 - 10:30 | Likai Chen (St. Louis)
Stochastic gradient descent for quantiles
Details
Abstract: This talk considers the recursive estimation of quantiles using the stochastic gradient descent (SGD) algorithm with Polyak-Ruppert averaging. The algorithm offers a computationally and memory efficient alternative to the usual empirical estimator. Our focus is on studying the non-asymptotic behavior by providing exponentially decreasing tail probability bounds under mild assumptions on the smoothness of the density functions. This novel non-asymptotic result is based on a bound of the moment generating function of the SGD estimate. We apply our result to the problem of best arm identification in a multi-armed stochastic bandit setting under quantile prefererence. Link to Paper |
Coffee | |
11:00 - 11:30 | Florentina Bunea (Cornell) Inference for the Wasserstein Distance between mixing measures in topic models Details Abstract: The Wasserstein distance between mixing measures has come to occupy a central place in the statistical analysis of mixture models. We give the first axiomatic justification of its usage as a canonical measure of discrepancy between any mixture distributions. Inference for the Wasserstein distance between mixing measures is generally difficult in high dimensions. Specializing to discrete mixtures arising from topic models, we offer the first minimax lower bound on estimating the distance between pairs of mixing measures in this model class. This reveals regimes under which fast estimation of the distance between mixing measures can be expected, even if the ambient dimension of the mixture distributions is large. In such regimes, we develop fully data-driven inferential tools that allow us to obtain the first asymptotically valid confidence intervals for the Wasserstein distance between mixing measures, in topic models. Our results apply to potentially sparse mixtures of potentially sparse highdimensional discrete probability distributions, and are illustrated via an example on movie reviews from the IMDb data set. Link to Paper |
11:30 - 12:00 | Ingo Steinwart (Stuttgart)
Learning from data with low intrinsic dimension
Details
Abstract: We derive improved regression and classification rates for support vector machines using Gaussian kernels under the assumption that the data has some low-dimensional intrinsic structure that is described by the box-counting dimension. Under some standard regularity assumptions for regression and classification we prove learning rates, in which the dimension of the ambient space is replaced by the box-counting dimension of the support of the data generating distribution. In the regression case our rates are in some cases minimax optimal up to logarithmic factors, whereas in the classification case our rates are minimax optimal up to logarithmic factors in a certain range of our assumptions and otherwise of the form of the best known rates. Furthermore, we show that a training validation approach for choosing the hyperparameters of an SVM in a data dependent way achieves the same rates adaptively, that is without any knowledge on the data generating distribution. In addition, we show that the same results hold true for localized version of these kernel methods, which substantially reduce the computational resources for training and testing. Finally, we conduct extensive experiments. The talk is based upon the papers 'Adaptive learning rates for support vector machines working on data with low intrinsic dimension' (2021) and 'Intrinsic dimension adaptive partitioning for kernel methods' (2022). Link to Paper 1 - Link to Paper 2 |
12:00 - 12:30 | Clement Berenfeld (Potsdam) Estimating a density near an unknown manifold: a Bayesian nonparametric approach Details Abstract: We study the Bayesian density estimation of data living in the offset of an unknown submanifold of the Euclidean space. In this perspective, we introduce a new notion of anisotropic Hölder for the underlying density and obtain posterior rates that are minimax optimal and adaptive to the regularity of the density, to the intrinsic dimension of the manifold, and to the size of the offset, provided that the latter is not too small - while still allowed to go to zero. Our Bayesian procedure, based on location-scale mixtures of Gaussians, appears to be convenient to implement and yields good practical results, even for quite singular data. Link to Paper |
12:45 | Lunch at IWH |
15:00 - 15:30 | Christophe Giraud (Paris Saclay)
Improving fairness in online learning
Details
Abstract: Online learning and recommandation systems are ubiquitous in decision making, including in some sensitive cases such as job hiring, decision in justice, admission in college. Applying blindly standard ML technics can lead to unfair and discriminatory decisions. We will discuss some approaches for improving fairness in online learning and recommandation systems. Link to Paper 1 - Link to Paper 2 - Link to Paper 3 |
15:30 - 16:00 | Luiz Chamon (Stuttgart)
Probably Approximately Correct Constrained Learning
Details
Abstract: Today, widely used learning methods do not incorporate requirements organically, which has led to data-driven solutions prone to tampering, unsafe behavior, and biased, prejudiced actions. In this talk, I will show when and how it is possible to learn under statistical requirements by developing the theoretical underpinnings of constrained learning. I will define constrained learning by extending the classical PAC framework and show that despite appearances, constrained learning is not harder than unconstrained learning. In fact, they have essentially the same sample complexity. I will also show a practical learning rule that under mild conditions can tackle constrained learning tasks by solving only unconstrained ERM problems, a duality that holds despite the lack of convexity. I will illustrate how these advances can be used to directly tackle challenging problems in robust learning. These contributions suggest how we can go beyond the current objective-centric learning paradigm towards a constraint-driven learning one. Link to Paper 1 (NeurIPS'21) - Link to Paper 2 (ICML'22) |
16:00 - 16:30 | Azadeh Khaleghi (ENSAE Paris)
Inferring the mixing properties of an ergodic process
Details
Abstract: In this talk I will introduce some of our recent results on the estimation of mixing coefficients from stationary ergodic sample paths. Specifically, we have proposed strongly consistent estimators of the L1 norm of the sequence of α-mixing (respectively β-mixing) coefficients of a stationary ergodic process. We have also provided strongly consistent estimators of individual α-mixing (respectively β-mixing) coefficients for a subclass of stationary α-mixing (respectively β-mixing) processes with summable sequences of mixing coefficients. The estimators are in turn used in the development of hypothesis tests for mixing rates. I will end with a discussion on the potential application of these results to the approximation of the optimal strategy in restless multi-armed bandits. Link to Paper 1 (arXiv) - Link to Paper 1 (IEEE) |
Coffee | |
17:00 - 17:30 | Zhou Zhou (Toronto)
Auto-regressive Approximations to Non-stationary Time Series
Details
Abstract: Understanding the time-varying structure of complex temporal systems is one of the main challenges of modern time series analysis. In this paper, we show that every uniformly-positive-definite-in-covariance and sufficiently short-range dependent non-stationary and nonlinear time series can be well approximated globally by a white-noise-driven auto-regressive (AR) process of slowly diverging order. To our best knowledge, it is the first time such a structural approximation result is established for general classes of non-stationary time series. A high dimensional L+K62 test and an associated multiplier bootstrap procedure are proposed for the inference of the AR approximation coefficients. In particular, an adaptive stability test is proposed to check whether the AR approximation coefficients are time-varying, a frequently-encountered question for practitioners and researchers of time series. As an application, globally optimal short-term forecasting theory and methodology for a wide class of locally stationary time series are established via the method of sieves. Link to Paper |
17:30 - 18:00 | Jiaqi Li (St. Louis)
Asymptotic Theory for Constant Step Size Stochastic Gradient Descent
Details
Abstract: In this talk I present a novel approach to understanding the behavior of Stochastic Gradient Descent (SGD) with constant step size by interpreting its evolution as a Markov chain. Unlike previous studies that rely on the Wasserstein distance, our approach leverages the functional dependence measure and explore the Geometric-Moment Contraction (GMC) property to capture the general asymptotic behavior of SGD in a more refined way. In particular, our approach allow SGD iterates to be non-stationary but asymptotically stationary over time, providing quenched versions of the central limit theorem and invariance principle valid for averaged SGD with any given starting point. These asymptotic results allow for the initialization of SGD with multiple distinct step sizes, which is a widespread practice in the discipline. We subsequently show a Richardson-Romberg extrapolation with an improved bias representation to bring the estimates closer to the global optimum. We establish the existence of a stationary solution for the derivative SGD process under mild conditions, enhancing our understanding of the entire SGD procedure across varied step sizes. Lastly, we propose an ecient online method for estimating the long-run variance of SGD solutions. This aligns with the recursive nature of SGD, thereby facilitating fast and ecient computations. |
18:15 - | Dinner at IWH |
Wednesday, July 12
Venue: IWH.9:00 - 9:30 | Angelika Rohde (Freiburg)
Bootstrapping high-dimensional sample covariance matrices
Details
Abstract: Bootstrapping is the classical approach for distributional approximation of estimators and test statistics when an asymptotic distribution contains unknown quantities or provides a poor approximation quality. For the analysis of massive data, however, the bootstrap is computationally intractable in its basic sampling-with-replacement version. Moreover, it is even not valid in some important high-dimensional applications. Combining subsampling of observations with suitable selection of their coordinates, we introduce a new "(m,mp/n) out of (n,p)"-sampling with replacement bootstrap for eigenvalue statistics of high-dimensional sample covariance matrices based on n independent p-dimensional random vectors. In the high-dimensional scenario p/n → c ∈ [0,∞), this fully nonparametric bootstrap is shown to consistently reproduce the underlying spectral measure if m/n → 0. If m^2/n → 0, it approximates correctly the distribution of linear spectral statistics. The crucial component is a suitably defined representative subpopulation condition which is shown to be verified in a large variety of situations. The proofs incorporate several delicate technical results which may be of independent interest. |
9:30 - 10:00 | Sumanta Basu (Cornell)
Regularized Estimation of Sparse Spectral Precision Matrices
Details
Abstract: Spectral precision matrix, the inverse of a spectral density matrix, is an object of central interest in frequency-domain multivariate time series analysis. Estimation of spectral precision matrix is a key step in calculating partial coherency and graphical model selection of stationary time series. When the dimension of a multivariate time series is moderate to large, traditional estimators of spectral density matrices such as averaged periodograms tend to be severely ill-conditioned, and one needs to resort to suitable regularization strategies. In this work, we propose complex graphical lasso (CGLASSO), an l1-penalized estimator of spectral precision matrix based on local Whittle likelihood maximization. We develop fast pathwise coordinate descent algorithms to implement CGLASSO for large dimensional time series. We also present a complete non-asymptotic theory of our proposed estimator which shows that consistent estimation is possible in a high-dimensional regime as long as the underlying spectral precision matrix is suitably sparse. We illustrate the advantage of CGLASSO over competing alternatives using extensive numerical experiments on simulated data sets, and demonstrate its performance on real fMRI data. |
10:00 - 10:30 | Johannes Lederer (Bochum) Sparsity in Data Science Details Abstract: Sparsity has become popular in some parts of data science because it can avoid overfitting, speed up computations, and facilitate interpretations. In other parts of data science, however, the full potential of sparsity still needs to be explored. This presentation highlights recent advances in various areas, such as extreme-value theory, graphical models, time series, and deep learning. |
Coffee | |
11:00 - 11:30 | Sara van de Geer (Zürich)
Noisy basis pursuit
Details
Abstract: Basis Pursuit estimates the vector of regression coefficients by choosing the interpolator of the data that has the smallest l1-norm. For the case of i.i.d. Gaussian design independent of the noise, Wang et al. [2021] show the intriguing result that noisy Basis Pursuit is consistent. By cleverly exploiting the Gordon Min-Max Theorem, they derive tight bounds. We will address the question whether consistency can be established in a more direct way using geometric insights and standard empirical process theory. Material: G. Wang, Donhauser K., and F. Yang. Tight bounds for minimum l1-norm interpolation of noisy data, 2021. arXiv:2111.05987. |
11:30 - 12:00 | Bernhard Stankewitz (Bocconi)
Early stopping for L2-boosting in high-dimensional linear models
Details
Abstract: Increasingly high-dimensional data sets require that estimation methods do not only satisfy statistical guarantees but also remain computationally feasible. In this context, we consider L2-boosting via orthogonal matching pursuit in a high-dimensional linear model and analyze a data-driven early stopping time τ of the algorithm, which is sequential in the sense that its computation is based on the first τ iterations only. This approach is much less costly than established model selection criteria, that require the computation of the full boosting path. We prove that sequential early stopping preserves statistical optimality in this setting in terms of a fully general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. Finally, an extensive simulation study shows that at an immensely reduced computational cost, the performance of these type of methods is on par with other state of the art algorithms such as the cross-validated Lasso or model selection via a high dimensional Akaike criterion based on the full boosting path. Link to Paper |
12:00 - 12:30 | Timothy Cannings (Edinburgh)
Nonparametric classification with missing data
Details
Abstract: We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a new method, called the Hard-thresholding Anova Missing data (HAM) classifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility. This is joint work with Torben Sell and Thomas Berrett. Link to Paper |
12:45 | Lunch at IWH |
14:00 - 14:30 | Richard Davis (Columbia)
Kernel PCA for Multivariate Extremes
Details
Abstract: We propose kernel PCA as a method for analyzing the dependence structure of multivariate extremes and demonstrate that it can be a powerful tool for clustering and dimension reduction. Our work provides some theoretical insight into the preimages obtained by kernel PCA, demonstrating that under certain conditions they can effectively identify clusters in the data. We build on these new insights to characterize rigorously the performance of kernel PCA based on an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory and provide a careful analysis in the case where the extremes are generated from a linear factor model. We give theoretical guarantees on the performance of kernel PCA preimages of such extremes by leveraging their asymptotic distribution together with Davis-Kahan perturbation bounds. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods. Link to Paper |
14:30 - 15:00 | Marten Wegkamp (Cornell)
Discriminant Analysis in High-Dimensional Gaussian Latent Mixtures
Details
Abstract: We consider binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier and we prove that the obtained rates are optimal, up to some logarithmic factor, in the minimax sense. In the second part of the talk, we retain all PCs to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure consistent and even minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way we encode the labels. This is joint work with Xin Bing, University of Toronto. Link to Paper 1 - Link to Paper 2 |
15:00 - 15:30 | Joseph Meyer (Heidelberg)
Random Planted Forest: A Directly Interpretable Tree Ensemble
Details
Abstract: We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. |
15:30 - 16:00 | Enno Mammen (Heidelberg)
Using cluster analysis for the identification of panel models
Details
Abstract: In the econometric literature recently models have been studied which are not identifiable because of unobserved individual effects. It has been proposed to use cluster analysis for the individual effects to get identification. In this talk we consider a fixed effects panel model where the parameters on time-constant covariates are not identified. This talk presents a new approach to clustering in this model to ensure identifiability. By using unsupervised nonparametric density based clustering, cluster patterns including their location and number are chosen data adaptively. The approach works with large data structures. Our approach differs in two respects from the current literature. We allow for atoms, i.e. individuals not belonging to a cluster and we consider an asymptotic framework where the clusters are not consistently estimated in the limit. The performance of our method for large data sets is illustrated by an application to labour market data with 250,000 individuals and 2,000,000 person-year observations. The talk reports on joint work with Ralf Wilke (Copenhagen) and Kristina Zapp (Mannheim). |
Closing | |
16:30 - | Visit of Heidelberg castle |