Abstracts of invited talks and tutorials: Algebraic Statistics 2015, 8-12 June, University of Genoa.


INVITED TALKS


Elvira Di Nardo
(University of Basilicata)

Title: Symbolic methods in statistics: elegance towards efficiency

Abstract: In the last ten years, the employment of symbolic methods has substantially extended both the theory and the applications of statistics. By symbolic methods we refer to the set of manipulation techniques aiming to perform algebraic calculations through an algorithmic approach. The goal is to find efficient mechanical processes to pass to a computer. Typically these algebraic expressions are encountered within statistical inference or parameter estimation. Recent connections with free probability and its applications, within random matrices and other satellite area, have extended its boundaries of applicability. To find efficient symbolic algorithms challenges with new problems involving both computational and conceptual issues. There are many packages devoted to numerical/graphical statistical tool sets but not doing algebraic/symbolic computations. The packages filling this gap are not open source. R is a much stronger numeric programming environment and the procedures including symbolic software are not yet specifically oriented for statistical calculations. So the availability of a widely spread open source symbolic platform will be of great interest, especially if there are interface capabilities to external programs. The conceptual aspects related to symbolic methods involve more strictly mathematical issues. In this picture, the combinatorics has no doubt a preeminent role. But, what we regard as symbolic computation is now evolving towards an universal algebraic language which aims to combine syntactic elegance and computational efficiency. Experience have shown that syntactic elegance often requires the acquisition of innovative techniques and to climb this steep learning curve can be a deterrent to pursue the goal. But, having got a different and deeper viewpoint, the efficiency is obtained as by product and the result can be surprisingly better of what you expected. Working examples will be polykays for random vectors or random matrices, with special reference to non-central Wishart distributions.


Thomas Kahle  (OvGU Magdeburg)

Title: Algebraic geometry of Poisson regression

Abstract: Designing experiments for generalized linear models is tricky because the optimal design depends on unknown parameters. Here we investigate local optimality. We try to understand, for each design, its region of optimality in parameter space. In some cases these regions are semi-algebraic and feature interesting symmetries. We demonstrate this with the Rasch Poisson counts model.
This is joint work with Rainer Schwabe.


Sonja Petrović  (Illinois Institute of Technology)

Title: What are shell structures of random networks telling us?

Abstract: In the network (random graphs) literature, network analyses are often concerned - either directly or indirectly - with the degrees of the nodes in the network. Familiar statistical frameworks, such as the beta or p1 models, associate probabilities to networks in terms of their degree distributions. However, this approach may fail to capture certain vital connectivity information about the network. Often, it matters not just to how many other nodes a particular node in the network is connected, but also to which other nodes it is connected. Degree-centric analyses are not well-suited to model such situations. This talk introduces a model family for one such connectivity structure motivated by examples of social networks, and discusses the relevant algebraic/geometric problems, simulations and sampling algorithms.
(Joint work with Karwa, Pelsmajer, Stasi, Wilburne)


Jim Q. Smith (The University of Warwick)

Title: The Geometry of Chain Event Graphs

Abstract: The class of chain event graphs (CEGs) - which contains the class of discrete Bayes Nets as a special case - has now been established as a widely applicable modeling tool. But the family also enjoys some interesting associated mathematical structure. A CEG is specifed through an event tree with some of its edge probabilities being equated. So in particular each of its atoms - its root to leaf paths - has a monomial associated to it corresponding to a product of edge probabilities. It therefore follows that, in particular, the class of probability measures associated with each given CEG can be mapped on to a family of polynomials. This gives a new area of statistics where techniques of algebraic geometry can be usefully applied. In this talk I will illustrate how we have recently used this algebraic description to come to a better understanding of the statistical equivalence classes of CEGs. The potential uses of this classification for causal discovery will then be explored. This is joint work with one of my PhD students: Christiane Görgen.


Bernd Sturmfels (University of California Berkeley)

Title: Exponential Varieties

Abstract: Exponential varieties arise from exponential families in statistics. These real algebraic varieties have strong positivity and convexity properties, generalizing those of toric varieties and their moment maps. Another special class, including Gaussian graphical models, are varieties of inverses of symmetric matrices satisfying linear constraints. We present a general theory of exponential varieties, with focus on those defined by hyperbolic polynomials. This is joint work with Mateusz Michałek, Caroline Uhler, and Piotr Zwiernik.
TUTORIALS

Luis García-Puente (Sam Houston State University)

Title: R package for algebraic statistics

Abstract: In this tutorial we will introduce the R package ‘algstat’. R is a free software environment for statistical computing and graphics. The package algstat provides functionality for algebraic statistics in R. We will discuss some of its features such as exact inference in log-linear models for contingency table data, analysis of ranked and partially ranked data, and basic multivariate polynomial manipulation through its interface with computer algebra systems such as Macaulay2 and Bertini. The tutorial will include a large practical/hands on component. No previous experience with R or other computer algebra systems is required.

Giovanni Pistone (de Castro Statistics, Collegio Carlo Alberto, Moncalieri), Luigi Malagò (Shinshu University, Japan, and INRIA Saclay - Île-de-France)

Title: Information Geometry and Algebraic Statistics on a finite state space and on Gaussian models

Abstract: It was shown by C. R. Rao in a paper published 1945 that the set of positive probabilities on a finite state space {0, 1, …, n} is a Riemannian manifold in a way which is of interest for Statistics. It was later pointed out by Sun-Ichi Amari, that it is actually possible to define two other affine geometries of Hessian type on top of the classical Riemannian geometry. Amari gave to this new topic the name of Information Geometry. Information Geometry and Algebraic statistics are deeply connected because of the central place occupied by exponential families4 in both fields. The present course is focused mainly on Differential Geometry, but arguments from the theory of Toric Models will be important.

Lecture 1 (Pistone) The Differential Geometry of the Simplex
Lecture 2 (Pistone) The differential Geometry of statistical models
Lecture 3 (Malagò) Applications to Optimization and Machine Learning


Piotr Zwiernik (University of Genoa)

Title: Latent tree graphical models

Abstract:  The aim of this short lecture course is to introduce various mathematical and statistical aspects of latent tree graphical models. The latent tree graphical model is a special type of a statistical graphical model. The associated graph is a tree, which gives a tractable model with a rich combinatorial structure. What makes this model more complicated and also more interesting is that some variables in the system are assumed to be latent (not observed). This adds modeling power but also leads to various statistical issues. For example the associated likelihood function is multimodal and its maxima often lie on the boundary of the parameter space (and hence they are not critical points of the likelihood function). Another important statistical problem is that these models may be not identifiable.

I will discuss the following related topics:
1. Trees, tree metrics and spaces of trees: basic graph-theoretic tree concepts, tree metrics and other tree spaces that arise naturally in the study of latent tree graphical models.
2. Latent tree graphical models: model definition, links to Bayesian networks and undirected graphical models on trees; identifiability and moment structure.
3. Tree inference and parameter estimation: overview of methods for learning the underlying tree structure which is of interest in many applications; the structural EM algorithm for the MLE estimation and other approximate methods.