HiTEc Spring Course
The Spring Course will consist of a series of tutorials on representative HiTEc topics.

Dates: 3-5 April 2023
Venue: Poseidonia Beach Hotel, Limassol, Cyprus.
Room: Triton 3, mezzanine. Speakers:
Karel Hron, Palacky University, Czech Republic.
Dan Vilenchik, Ben-Gurion University, Israel.
Peter Winker, University of Giessen, Germany.
David Suda, University of Malta, Malta.
Vladimir Batagelj, University of Ljubljana, Slovenia.

Programme

Monday, 3 April 2023

  • 09:00 – 10:00 Session 1.1 - Module I
  • 10:00 – 10:30 Coffee break
  • 10:30 – 13:00 Session 1.2 - Module I
  • 13:00 – 14:30 Lunch break
  • 14:30 – 16:10 Session 2.1 - Module II
  • 16:10 – 16:40 Coffee break
  • 16:40 – 18:30 Session 2.2 - Module II

Tuesday, 4 April 2023

  • 09:00 – 10:30 Session 3.1 - Module III
  • 10:30 – 11:00 Coffee break
  • 11:00 – 13:00 Session 3.2 - Module III
  • 13:00 – 14:30 Lunch break
  • 14:30 – 15:30 Session 4.1 - Module IV
  • 15:30 – 16:00 Coffee break
  • 16:00 – 18:00 Session 4.2 - Module IV

Wednesday, 5 April 2023

  • 09:00 – 11:00 Session 5.1 - Module V
  • 11:00 – 11:30 Coffee break
  • 11:30 – 13:30 Session 5.2 - Module V
  • 13:30 – 15:00 Lunch break
  • 15:00 – 17:00 Session 5.3 - Module V

Module I: Statistical processing of probability density functions

Karel Hron, Palacký University, Czech Republic.

The analysis of distributional data (probability density functions or histogram data) has recently gained increasing attention in the applications. Distributional data are often observed by themselves, or as result of aggregation of large streams of data. The course will provide an introduction to the analysis of these data using a Functional Data Analysis (FDA) approach, grounded on the perspective of Bayes spaces. These spaces are mathematical spaces whose points are densities (or, more generally, measures), which generalize to the FDA setting the Aitchison simplex for multivariate compositional data. The course will give an brief overview of the concise theory of Bayes spaces, as well as of statistical methods developed in this setting. All the methods will be illustrated through examples from real case studies.

Topics:

  • FDA and the geometry of Bayes spaces
  • Exploratory FDA and dimensionality reduction
  • Density-on-scalar, scalar-on-density and density-on-density functional regression
  • Bivariate densities and their orthogonal decomposition
  • Introduction to multivariate Bayes spaces

Module II: Feature selection for high-dimensional data

Dan Vilenchik, Ben-Gurion University, Israel.

High dimensional (HD) data is characterized by a large number of features compared to a much smaller number of samples. Such data is prevalent in biology, economics, psychology, etc. The “curse of dimensionality” refers to a host of phenomena that concern consistency issues when applying classical statistical tools to HD data. One way to reduce the dimensionality of the data is by performing feature selection. In this talk we are going to focus on unsupervised feature selection methods. We present some of the main methods used by practitioners, and study key questions that arise (both rigorously and hands-on data-driven). The first question is what happens when the data has a low signal-to-noise ratio (SNR)? How does low SNR in the HD setting affects feature selection, and what remedies may be offered. Then we study the prevalence of truly hard HD datasets in real-world applications (we define what we mean by “truly hard”), or are such problems mainly a theoretical curiosity.

Module III: Text data in econometrics

Peter Winker, University of Giessen, Germany.

There is a growing interest in and use of textual information in different fields of economics comprising financial markets (analysts’ statements, communication of central banks) over innovation activities (patent abstracts, firm websites), and the development of economic science (journal articles, conference abstract). Using such textual information for quantitative analysis involves several steps including, e.g., 1) the selection of appropriate sources (corpora) and establishing access, 2) the preparation of the text data for further analysis, 3) the identification of themes within documents, 4) the quantification of the relevance of themes in different documents, 5) the aggregation of relevant information, e.g. across sectors or over time, 6) the application of the generated indicators. The tutorial will provide some first insights and recommendations concerning these steps of the analysis and address open issues regarding, e.g., computational complexity and statistical robustness of the methods. All steps will be illustrated with empirical examples.

Module IV: Handling higher dimensions: Regularisation, sparsity and metaheuristics

David Suda, University of Malta, Malta.

Regularisation and sparsity in statistics refer to techniques which are not necessarily solely applicable to the high-dimensionality problem, but can certainly be beneficial to solving such problems. In this course, we start by going through techniques in regularised regression (namely penalised regression and partial least squares) and then also dimension reduction, and also how they can be applied to the high-dimensional context. When it comes to dimension reduction, we namely consider the PCA class of techniques, however we also look into techniques within this class that are more applicable to the time series context. Apart from regularisation and sparsity, this course also aims to look into metaheuristic algorithms which can be used in the variable selection problem, such as the genetic algorithm and the firefly algorithm to name a few. We shall be going through a number of hands-on examples, and also a number of examples in literature where the aforementioned techniques are used. Installation of R/R Studio on one’s device prior to the course is recommended.

Module V: Introduction to the analysis of multiway networks

Vladimir Batagelj, University of Ljubljana, Slovenia.

In the SNA literature, we can find some well-known 3-way networks such as CKM physicians' innovation (1957), Kapferer tailor shop (1972), Krackhardt office CSS (1987), Lazega law firm (2001), etc. Recently physicists working on complex networks, for example, Manlio De Domenico (2015), became interested in multiplex (multi-relational) networks - a subclass of multiway networks. In 1992 Borgatti and Everett, following Baker (1986), extended the blockmodeling to general k-way binary networks. In Genova (2022) a 4-way network about Italian student mobility, based on V = ( provinces, universities, programs, years ) was analyzed. Similar is the World trade network (exporters, importers, categories, years).

A weighted multiway network N = (V,L,w) is based on nodes from k finite sets (ways or dimensions) V = (V_1, V_2, ..., V_k), the set of links L ⊆ V_1 ╳ V_2 ╳ ... ╳ V_k, and the weight w : L → R. In a general multiway network, different additional data (node properties, link weights) can be known.

The course starts with some examples of multiway networks and a format for their description. The participants will learn about the basic notions and different transformations of multiway networks such as slicing, reordering of ways, joining the ways, flattening of a way, projection to a selected way, aggregation by a way partition (blockmodeling), normalization, recoding (binarization), 3D visualization (based on X3D), connectivity, cores, and others. They will be illustrated by their application in the analysis of different multiway networks.

To support the analysis of multiway networks in R, the R package MWnets is developed. The current version of the package is available at https://github.com/bavla/ibm3m/tree/master/multiway .

Grants
PhD students and young researchers, according to the COST definition (under 40 years), from eligible COST countries* can apply for a limited number of grants. The granted participants will be reimbursed a daily allowance of 170 euros per day (on average for 3 days) plus travel expenses of up to 350 euros.
  • In order to apply for the grants, candidates should submit their CV by e-mail to hiteccostaction@gmail.com.
  • Deadline for applications: 25th January 2023.
  • Granted candidates will be informed by e-mail after the deadline and must register seven days after the notification to secure their grants. Otherwise, their grants will be revoked and assigned to another candidate.
  • The granted candidates must attend all the sessions of the Spring course and sign the attendance list in order to obtain their grants.
*Eligible COST countries: Albania, Armenia, Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Republic of Moldova, Montenegro, The Netherlands, The Republic of North Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, Ukraine, United Kingdom and Israel.