Research Overview

Main Areas of Research:

Our research activities are centered around the development of statistical methodology and its application in contemporary and challenging data analyses. One of our main goals is to provide statistical models that are easy to interpret and can thus convey the results found in the data in a clear and concise way.

The focus of our research is on:

  • Analysis of categorical and discrete data structures
  • Feature selection
  • Regularization and structured regression
  • Mixed models

Project Overview:

Research Projects:

DFG Project "Regularization for Discrete Data Structures"
Principal investigator: Gerhard Tutz
Staff: Margret Oelker, Wolfgang Pößnecker
Statistical modelling of discrete structures requires vast amounts of parameters. Even treating only a handful of discrete features yields high-dimensional models if these features have many levels. A possible way of dealing with these high-dimensional situations are modern regularization techniques which allow for estimable and interpretable models. Existing regularization approaches, however, focus almost exclusively on continuous variables.

Therefore, the goal of this project is to develop regularization techniques that are tailored specifically to models with a discrete structure. A major goal at this is that the resulting models are easy to interpret, which is crucial for real world applications. Interpretability is achieved by a regularization that yields a parsimonious parameterization and, at the same time, accounts for the special structure of the respective model.

One of the objects of study are classical regression models with categorical predictors and models for discrete dependent variables. Another major area of research are models with categorical effect-modifying variables and finite mixture models, which account for heterogeneity in the data by the interaction of the other covariates with an observed or latent discrete feature. The focus in these models is similar and lies in a variable selection that is suitable for discrete features, the clustering of similar categories and the distinction between category-specifc and global parameterizations.

CEST "Center for Empirical Studies (CEST): Data Analysis, Modelling & Knowledge
Discovery in Social Sciences, Economics and Humanities"
Coordination: Gerhard Tutz
The Center of Empirical Studies is a research initiative linking empirical and methodological research groups from several different faculties. Methodological challenges include modeling unobservable heterogeneity or measuring latent traits, that recur in similar form in, for example, economic, sociological and psychological tasks. The aim of the Center for Empirical Studies is thus to enhance the explanatory power of empirical studies by means of new methodological developments. The initiative is organized in three interacting areas: Statistical Learning, Data Mining & Knowledge Discovery, Measurement & Evaluation and Dynamic Modeling.

CEST Project"Dynamical Modelling: Forecasting Models using Business Surveys"
Kai Carstensen, Gerhard Tutz
Forschung - Projekt
The identification of the current state in the business cycle and the forecast of the next quarters do not only receive a lot of attention in the public, they are also of prime importance for the plans of firms and the government. The most important leading indicator for the German economy is the ifo Business Climate Index that is based on a monthly business survey with more than 7,000 respondents. Due to the large number of firms, the results can be analysed at a disaggregate level for the different sectors, firms and response categories. Thus, it is possible to use a panel of sector-specific survey indices in order to forecast the sectoral gross value added or to examine whether they are leading important aggregate variables like total gross value added or gross domestic product (GDP).

A number of questions are of interest:
          • Are there sectors or firms that have stable leading properties to the target series?
          • Are there interactions between sectors or firms that attenuate or intensify the original signal?
          • What is the delay with which firms react to macroeconomic impulses? Which explanations can be found?
Due to the large number of possible lead-lag relationships and the additional difficulty of time-varying structures it seems necessary to pre-select factors that are relevant for the question of interest. Modern selection methods such as parametric and nonparametric boosting or random forests will be adapted to the problem at hand.

CEST Project"Dynamical Modelling: Modelling of Sojourn Time"
Kai Carstensen, Gerhard Tutz
Forschung - Projekt
The modelling of sojourn time is crucial for grasping the dynamics of response behavior in panel surveys. Particularly the Ifo Business Survey, which is conducted monthly among 7,000 participating firms, is an excellent basis for the analysis of nonresponse behaviour in business surveys because it can build on an enormous data set. The main research tasks that can be answered by means of business surveys concern nonresponse behavior and expectations at the firm level. Nonresponse considerably influences the stability of the data and can create a bias in the results. While severals analyses of the issue are available for individual and houshold surveys, there is little research on processes and sources of compliance in business surveys. The main factors responsible for "panel fatigue", i.e. a decreasing compliance over time, can be used to improve the quality and uncover present selectivity. At the firm level, the analysis of expectations is of special interest, particularly because current macroeconomic models emphasize the importance of expectations to explain business cycle dynamics. However, it often turns out empirically that the standard assumption of rational expectations does not hold. Since the surveys of the Ifo Institute also contain expectational categories, they allow a deeper analysis of the process of expectation formation. An interesting question is in how far current expectations correlate and interact with previous expectations and other response categories, especially with future realizations.

Methodological problems for the Ifo data arise from the fact that the responses of the enterprises are given in categorical form (e.g., “better”, “unchanged”, “worse”). In the corresponding competing risk approaches, the question needs to be evaluated whether the monthly survey allows for discrete or continuous modelling. In general, the empirical analysis of sojourn time data often shows that the effect of influencing variables varies over time. Ignoring these effects often results in artificial effect sizes and reduced prediction accuracy. In order to model these variations adequately, it is necessary to incorporate time-varying effects in nonparametric form. This leads to severe selection problems: which variables should be modelled parametrically to be time-constant and which nonparametrically to be time-varying? The selection problems are to be solved by means of modern selection techniques like the Lasso and Boosting. Especially the generalization to multi-state models and the modelling of heterogeneity are of substantial interest for the intended analysis of the Ifo business survey as well as other econometric and sociological surveys.

CEST Project"Measurement and Evaluation: New Methods for Item Response Theory"
Strobl, Lindmeier, Reiss, Gerhard Tutz
Forschung - Projekt
Item Response Theory (IRT) is one of last century's most important achievements of psychological diagnosis and empirical educational research: It allows for an objective measurement of latent person characteristics by means of separating item and person parameters. Moreover, as opposed to the classical psychological measurement theory, IRT provides the opportunity to statistically test the model assumptions. The Rasch model, which has been used in the PISA-study, is the most well-known IRT model.

In an intervention study on the effectiveness of competence-supporting learning environments the Rasch model was emplyed to measure mathematical comptetency for using diagrams and models in statistical contexts. In this study, both the objective measurement of the students' competencies and the modelling of the treatment effects is of major interest. Methodological challenges arise from the heterogeneity of the sample: The data have a hierarchical structure, because students are grouped in classes, classes in schools and so forth. Moreover it is necessary to check whether the measures are comparable for students from different groups, or if differential item functioning occurs.

The aim of the project is to develop and apply new IRT Methods to account for the sample heterogeneity: For the diagnosis of differential item functioning methods from machine learning will be used in combination with latent-class-approaches. The hierarchical data structure will be accounted for by means of mixed models.

LMUinnovativ Project "Analysis and Modelling of Complex Systems in Biology and Medicine" (Biomed-S)
Coordinator: Torsten Hothorn
Principal investigator: Gerhard Tutz
The general aim of this project is the modelling and analysis of complex biological and biomedical systems with methods from bioinformatics, mathematics, physics and statistics in cooperation with partnern in biology and medicine. The main focus is on pioneering areas of post genomics including systems biology and their applications in medicine and pharmaceutics, but also goes beyond to population biology. The project consists of three incorporated clusters:
  • Cluster A: Quantitative biology and biostatistics
  • Cluster B: Complex Systems in molecular medicine
  • Cluster C: Structures and dynamics of functional modules in model organisms

Former Research Projects:

DFG Project: "Model Based Feature Extraction and Regularisation in High-dimensional Structures"
Principal investigator: Gerhard Tutz
Staff: Jan Gertheiss
Forschung -
Feature extraction aims at detecting influential structures or patterns in data. The focus of the project is on model based feature selection methods where the predictor space is linked to the target criterion by parametric or semiparametric models and features are extracted with reference to the modelling approaches. The supervised learning techniques that are considered explicitly use the target criterion in the feature selection process in contrast to widely used two-step approaches where in the first step unsupervised learning is applied to extract features and only in the second step the features are linked to the target.

The type of model used depends on the data structure and the objective of modelling. One area of investigation is functional data where predictors are given as signals. Feature extraction then makes use of the information in the underlying metric space. These spatial methods tend to show better performance than equivariant methods where no ordering of predictors is used. For predictors without ordering, the focus is on the selection of variables when groups of highly correlated variables and different types of variables are present.