Data Sciences, Geometry and Combinatorics 2023/24

The “mathematics and computer science track” is a master’s (M2) track aiming for a dual formation in mathematics and computer science with courses at the border of the two disciplines (almost each course is taught jointly by at least a mathematician and a computer scientist). A significant fraction of the attendees is non-French speaking, and the lectures are delivered in English.

This track is simultaneously a track of the M2 “Computer Science” master’s program, which leads to a diploma in Computer Science and of the M2 “Mathematics” master’s program, which leads to a diploma in Mathematics. It is also possible to get both diplomas by registrating in each formation. It is jointly supported by the Bézout Labex, the UFR of Mathematics and the Institut Gaspard Monge. The persons in charge are Laurent Hauswirth (LAMA) and Cyril Nicaud (LIGM). This track is open to students, supported or not by the Bézout scholarship program.

The information below concerns the academic year 2022-2024. The archives on previous years are available here: 2018-2019, 2019-2020, 2020-2021, 2021-2022, 2022-2023. Before 2018, the Bézout Labex supported individual courses at the interface of mathematics and computer science.

REGISTRATION 2023-2024: Registration are possible on the following pages from the mathematics master’s program and the computer science master’s program (depending on the degree aimed for). These pages are not completely up-to-date concerning the pedagogical aspects… see below for the newest information. The schedule of the second semester is available: http://labex-bezout.fr/wp-content/uploads/2023/12/edt-M2-maths-info-s2-2023-24.pdf

Schedule

(Note: UE=”Unité d’enseignement”=indivisible piece of lectures; HETD=”Heure équivalent TD”: 1 HETD corresponds to 40 minutes of lecture.)

4 weeks on basics: complements in mathematics and complements in computer science. Each UE has 6 ECTS and 48 HETD. (In 2023-2024: from September 11 to October 6.)
10 weeks for a general large background: data sciences, probabilistic methods, discrete maths and geometric calculus. Each UE has 6 ECTS and 60 HETD, split into 2 courses of 3 ECTS and 30 HETD each. (In 2023-2024: from October 9 to December 15. Exams the week of January 9.)
8 weeks for two UE of specialization chosen by students among four UE, having each 6 ECTS and 40 HETD. (In 2023-2024: from January 15 to March 16, with one week break. Exams the week of March 26.)
A research memoir/internship of 18 ECTS. (In 2023-2024: from April 2.)

Preliminary list of courses

Semester	Name	ECTS	Hours
1	UE Basics Mathematics (analysis, algebra, probability, geometry)	6	48
1	UE Basics Computer Science (complexity, algorithms, programming, graphs)	6	48

1	UE Discrete and continuous optimisation	6	60
	Discrete optimisation	3	30
	Continuous optimisation	3	30
1	UE Geometry and Combinatorics	6	60
	Geometry	3	30
	Combinatorics	3	30
1	UE Data Sciences	6	60
	Introduction to data sciences	3	30
	Computational aspects of data sciences	3	30

2	UE Advanced Data Sciences	6	40
	Deep learning methods	3	20
	Computational aspects of deep learning methods	3	20
2	UE Maths specialization: Advanced Geometry	6	40
		3	20
		3	20
2	UE CS specialization: Algebraic Combinatorics and formal calculus	6	40
	Combinatorics Hopf algebra	3	20
	Operads	3	20
2	UE Large random matrices and applications	6	40
		3	20
		3	20

Detailed description of the courses of the first semester

Basics in Mathematics.

Algebra and linear algebra (Fradelizi): Groups: order, quotient group, cyclic groups, finite groups, finite abelian groups, group actions; Rings, polynomials and fields: ideals, principal ideal domains, finite fields; Linear algebra: endomorphisms, eigenvectors, spectral theorem.
Analysis (Sester): Normed vector spaces : equivalent norms, topology, continuous functions, compactness, the finite dimensional case; Examples of metric spaces; Differential calculus. Extremum problems; Convex functions, convexity inequalities. Asymptotic analysis.
Probability (Martinez): Random experiment and probability spaces. Law, mean value, moments,… of a random variable. Applications to combinatorics on graphs; Deviation’s inequality, concentration inequalities (Markov, Tchebychev, Hoeffding inequality…); Martingales, inequalities with Martingales; Markov chains. References : The Probabilistic Method (N. Alon, J. H. Spencer).
Geometry (Fanoni and Sabourau): We will give an introduction to graph theory: connectedness, degree, trees/forests, adjacency matrix etc.

Basics in Computer sciences.

Algorithmic: data structures (Nicaud): The data structures studied include: array, lists, stack, …; dynamical arrays; trees, well-balanced trees- heap, priority queues; hashtables; minimal range query; suffix array; suffix trees.
Complexity (Thapper): The course is an introduction to computational complexity theory. We will cover the following notions: Turing machines, the Church-Turing thesis, (un)decidability, the halting problem; P, NP, polynomial-time reductions, NP-completeness, the Cook-Levin theorem, co-NP; PSPACE; the time and space hierarchy theorems, Ladner’s theorem; the polynomial hierarchy and collapses; approximation of NP-hard problems. References: Introduction to the Theory of Computation (Michael Sipser).
Programmation (Borie): Programmation : Python and Sage.
Quick review of the basics of programming. Solve simple mathematical-algorithmic problems with Python (gcd, f(x) = 0, numerical integration algorithm, knapsack, backtracking, …). Getting started with Sage, a computer algebra system. Programming project at the interface mathematics and computer science.
Graph theory (Bulteau and Weller). Fundamentals; Connectivity- Planar graphs- Flow/Cut- Examples of graph classes- Examples of problems- Matchings; P/NP, Reductions; parameterized algorithms; examples of parameters; kernels; minors

Discrete and continuous optimization.

Discrete optimization (Thapper): Min-max results in combinatorial optimization provide elegant mathematical statements, are often related to the existence of efficient algorithms, and illustrate well the power of duality in optimization. The course aims at being a gentle introduction to the richness of this type of results, and especially those that belong to the theory of perfect graphs. It will make connections with the course of continuous optimization, in particular in what concerns linear programming and polyhedra, and will rely on concrete examples taken from industry that illustrate the relevance of tools from combinatorial optimization for real-world applications.
The preliminary plan of the course is as follows:- Discrete optimization in bipartite graphs: Hall’s marriage theorem, König’s theorems, algorithms; chains and antichains in posets: theorems of Dilworth and Mirsky; chordal graphs: interval graphs, coloring, duality, decomposition; perfect graphs: definition, weak and strong theorems; perfect graphs: polyhedra, algorithms; Lovász’ theta function: definition, computation, sandwich theorem, Shannon capacity.
Continuous optimization (Sandier): The course will cover the theory and main examples in convex optimization. The tentative list of topics covered is as follows: Convex sets and functions, convex optimization problems. Duality and optimality conditions. Among examples we will see Linear programming, Quadratic programming, Second order cone programming. Additional topics will include sparse solutions via L1 penalization, and notions on algorithms, including the simplex algorithm and interior point methods.

Geometry and combinatorics.

Geometry (De Mesmay): Algorithms and combinatorics of embedded graphs. This course will provide an introduction to the study of graphs arising in geometric settings, with a focus on planar graphs and graphs embedded on surfaces. The main objective of the course is to explore the interactions between the geometry and topology of low-dimensional spaces on the one hand, and the combinatorics of their discrete structures on the other hand, as well as to showcase algorithmic techniques tailored for these objects. Topics that will be investigated include: Basics of planar graphs: Jordan’s curve theorem, combinatorial representations, duality, Euler’s formula, Kuratowski-Wagner theorem, Planarity testing, Tutte embedding, Efficient algorithms for planar graphs, Classification of surfaces, basics of topological graph theory, Topological algorithms: homotopy testing and shortest loops.

Combinatorics (Novelli): The lectures on enumerative combinatorics will consist in the study of classical objects: permutations, trees, partitions, parking functions; classical sequences: factorial, Catalan, Schroder; classical methods: bijections, group actions, induction, generating series.The lectures will be heavily based on the study of various examples, some very easy and others trickier.

Data Sciences.

Theoretical aspects of Data Sciences (Bonis) :
This course will provide the necessary tools to understand data sciences from the theoretical perspective. The goal of this course is to introduce notions of statistical estimation with a focus on parametrics statistics and cast machine learning problems as a statistical estimation problem. In particular, the students will be familiar with the following concepts:
– The basics of statistical estimation (estimator, loss function, …)
– Classical estimators (moment method, maximum likelihood, …)
– Properties of statistical estimators (mean squared error, bias-variance tradeoff, …)
– Linear regression
– General machine learning problems (regression and classification)
Computational aspects of Data Sciences (Lacombe):
This course will present the basics of data sciences from the practitioner perspective. The goal is to understand the typical machinery (from theory and numerics) used by data scientists when they design machine learning models given a set of data. At the end of the course, the students will be familiar with most notions any data scientist should know, including:
– Standard terminology (supervised / unsupervised learning, regression / classification, etc.);
– Optimization through gradient descent
– Basics of supervised learning (linear regression, classification…)
– Basics of unsupervised learning (k-means, PCA…).
– Software: Python with pandas, numpy, scikit-learn, and jax.

Detailed description of the courses of the second semester (choose two courses)

Advanced Geometry and graph theory:

Advanced Graph theory (Fanoni and Sabourau):This course will focus on families of expander graphs. These are sequences of graphs, with growing number of vertices, which are at the same time sparse and highly connected. For their interesting properties, they have many applications in mathematics and computer science. We will talk about constructions of examples of expanders, the different viewpoints which can be used to define them and some of their properties. We will also present two applications, one in computer science (error correcting codes) and one in mathematics (embeddings in Euclidean spaces).

Advanced Data Sciences :

Computational aspects of deep learning methods (Lacombe)

This course will be in continuation with the course of the first semester, and will be dedicated to modern machine learning models, mainly (deep) neural networks and their different flavors. At the end of the course, the students will have notions on:
– Feedforward fully-connected networks (the simplest deep learning model) and their training trough back-propagation
– Recurrent neural networks
– Graph Neural Networks.
– Residual neural networks and Neural ODEs
– Software: Python with Tensorflow and/or PyTorch.
More importantly, they will be able to quickly understand modern machine learning problems and adapt to new models when they encountering them in academia or industry.

Theoretical aspects of deep learning methods (Hebiri)

In this lecture, we will provide statistical controls (bound on generalization error) for general supervised learning algorithms. First, we will derive a bound for the Empirical Risk Minimizer (ERM) using tools from the Vapnik–Chervonenkis theory. Then, we will consider several algorithms based on the convexification of the risk in the context of binary classification. The last part of the course will explore modern multi-class classification problems and present some techniques for addressing the problem based for instance on set-valued approaches.

By the end of the course, students will be familiar with classic ML algorithms such as trees, random forests, SVMs, boosting, and neural network. In addition, the methods introduced in the course will be compared on real data.

Algebraic combinatorics and formal calculus (Borie and Novelli)

Operads in combinatorics (Borie): Informally, an operad is a space of operations having one output and several inputs that can be composed. Each operad leads to the definition of category of algebras. This theory offers a tool to study situations wherein several operations interact with each others. This lecture begins by presenting some elementary objects of algebraic combinatorics: combinatorial classes and combinatorial algebras. We introduce then (non-symmetric) operads and study some tools allowing to establish presentations by generators and relations of operads. Koszul duality in non-symmetric operads is an important part of this theory which shall be presented. We end this lecture by reviewing some generalizations: colored operads, symmetric operads, and pros. We shall also explain how the theory of operads offers a tool to obtain enumerative results.

Algebraic combinatorics (Novelli): The lectures on algebraic combinatorics will consist in the study of: classical symmetric functions and a short discussion about representation theory; noncommutative symmetric functions (NCSF); the definition of Hopf algebras; the dual algebra of NCSF, quasi-symmetric functions; the modern generalizations of those algebras; and the use of all these algebraic properties (transition matrices, expressions in various bases, morphisms of Hopf algebras) to solve (classical) combinatorial questions. As in the lectures in combinatorics of the first semester, the lectures will be heavily based on the study of examples.

Large random matrices and applications (Loubaton, Najim and Guédon)

The purpose of large Random Matrix Theory (RMT) is to describe the eigenstructure (eigenvectors and eigenvalues) of matrices whose entries are random variables and whose dimensions go to infinity. The first results go back to Wigner (1948) for random symmetric matrices and Marchenko and Pastur (1967) for large covariance matrices. Both results have been motivated by questions in theoretical physics which still provides open problems. In the eighties, Voiculescu used RMT as a tool to address open problems in operator theory. This point of view turned to be extremely successful as many such open problems received a solution with the help of RMT. A whole theory known as free probability has been developed which tightly relies on RMT. In the nineties, RMT turned to be very successful to address problems in wireless communications, to analyze the performances of multi-antenna telecommunication networks, and to provide usefull results in statistical signal processing. For 20 years, the theory of large random matrices is very active as can be seen by the publication of five major monographs on the subject. The goal of the course is to present the most classical and prominent results in the field together with some statistical applications: Basic techniques in RMT and Stieltjes transform; Marchenko-Pastur’s theorem which describes the limiting spectral measure of large covariance matrices; other models of interest: large covariance matrices with general population matrix, signal + noise matrices, etc.; Small perturbations and spiked models. In terms of applications, we will describe the problem of statistical test in large dimension and other statistical problems.