Conférenciers invités
Conférenciers invités
|
Nous accueillerons sept invités :
|
Conférence pleinière
|
Jean-Daniel Fekete, Michael Greenacre et Wouter Saeys
|
Session spéciale MIMS
|
Katrijn Van Deun, Zouhair El Hadri, Age K. Smilde et Mohamed Nadif.
|
|
Cette session spéciale s’inscrit dans le cadre du metaprogramme DIGIT-BIO : https://digitbio.hub.inrae.fr/thematiques/comprendre/consortium-mims-2022-2023
|
|
Jean-Daniel FEKETE
|
27-28 février 2024
|
|
Jean-Daniel Fekete est directeur de recherche Inria. Il a reçu sa thèse de l'Université Paris-Sud en 1996 et a été recruté par Inria en 2002. Il est actuellement le responsable de l'équipe projet Inria Aviz (www.aviz.fr). Ses domaines de recherche sont la Visualisation Analytique, la Visualisation d'Information et l'Interaction Humain-Machine. Il a publié plus de 150 articles dans de multiples conférences et journaux. Il collabore avec des entreprises internationales comme Microsoft et Google, ainsi que françaises comme EDF. Son objectif scientifique est de permettre d'explorer et de comprendre des données complexes à l'aide de représentations visuelles interactives. Il a reçu le IEEE VGTC Visualization Technical Achievement Award en 2020, et il est membre de la IEEE VGTC Visualization Academy et ACM SIGCHI Academy. Il a été le président de l'association francophone d'interaction Homme-Machine (AFIHM) de 2009 à 2013.
|
Visualisation de données : quelques avancées récentes
|
La visualisation de données est une composante indispensable dans la boîte à outils de la science des données. Depuis une dizaine d'années, elle a acquis une grande légitimité grâce à de nombreux systèmes très expressifs et relativement faciles à utiliser, comme ggplot2 dans l'environnement R, vega-lite sur le web, et Tableau comme produit commercial. Ces systèmes reposent sur la "Grammaire des Graphiques" qui définit une organisation rigoureuse permettant de spécifier pratiquement toute visualisation. J'évoquerai cette richesse nouvelle disponible en donnant un aperçu de son fonctionnement. Mais la visualisation continue à évoluer en permettant d'explorer des structures de données plus complexes et volumineuses à l'aide de projections multidimensionnelles en particulier. Je donnerai aussi quelques exemples pour inviter la CHIMIOMETRIE à bénéficier pleinement des avancées récentes en visualisation.
|
|
Michael GREENACRE
|
27-28 février 2024
|
|
Michael Greenacre is "Senior Talent Professor" at the Universitat Pompeu Fabra in Barcelona and affiliated professor of the Barcelona School of Management. His academic work centres around methods for analyzing multivariate data, having specialized in correspondence analysis since his doctoral studies with Jean-Paul Benzécri in the 1970s, and then in compositional data analysis after collaborations with both John Aitchison and Paul Lewi (Paul was one of the co-founders of Chemometrics in 1983). He has written or co-edited 12 books, including Theory and Applications of Correspondence Analysis (1984), three separate editions of Correspondence Analysis in Practice (1993, 2007 and 2016) and most recently, Compositional Data Analysis in Practice (2018).
|
Compositional data analysis made simple: unsupervised and supervised learning
|
The analysis of compositions, usually data observations with compositional parts summing to 1 or 100%, or any nonnegative data where the relative values are of primary interest, has been getting increasing attention, especially in the various "-omics" fields. Here I take my cue from John Aitchison's statement: "Compositional data analysis is simple", as opposed to many attempts in the literature to make it complicated and over-sophisticated. In fact, all that is involved is a suitable data transformation, for example logratio transformations of selected pairs of compositional parts. Then regular statistical analysis continues as always, both in unsupervised and supervised learning situations, with appropriate care taken in the interpretation of the results. In this talk I will present applications in the three main areas where compositional data are found: geochemistry, biochemistry and genomics, the last case being special since there can be hundreds or possibly thousands of compositional parts. After working in the area of correspondence analysis for 45 years and, additionally, in compositional data analysis during the last 22 years, I have recently come full circle to discover that the chi-square standardization inherent in correspondence analysis, combined with a power transformation of the data, provides a valid alternative transformation for compositional data, which is conveniently interpretable in terms of the compositional parts, not their ratios.
|
|
Wouter SAEYS
|
27-28 février 2024
|
|
Wouter Saeys is a full professor in the KU Leuven Department of Biosystems in Leuven, Belgium and leads, where he leads the Biophotonics group with a focus on applications in the AgroFood chain and is head of the MeBioS division. His main research interests include light transport modelling and optical characterisation of biological materials, chemometrics and digital agriculture. In 2013, he was awarded by the European Network of Business and Industrial Statistics (ENBIS) with the ‘Young Statistician Award’ for his work on multivariate calibration of spectroscopic sensors in the agrofood industry. In 2022, he received the Thomas Hirschfeld Award from the International Council of Near Infrared Spectroscopy (ICNIRS). He is a full member of the Club of Bologna and member of the Chairman Advisory Committee and the Education Committee of the International Council of Near Infrared Spectroscopy (ICNIRS). He also contributes to the editorial boards of the Journal of Near Infrared Spectroscopy and Biosystems Engineering. More information can be found here.
|
Unsupervised monitoring of multivariate calibration models.
|
Multivariate calibration models are widely used for estimating quality traits of interest from a wide range of sensors which provide a multivariate fingerprint (e.g. a spectrum) of the measured process or product. Once a model has been built on a set of training data and properly validated on an independent test set, it can be deployed for routine use. However, the value of these models largely depends on their validity in the long term. Ensuring such validity requires model maintenance, which involves performance monitoring of the model and correspondent adaptation when degradation or model drift is detected. While traditional approaches for multivariate calibration monitoring focus on distance quantification of new samples, we present a strategy where the model drift is quantified based on model comparison. Once the model has been established on calibration and test data with reference values, only a separate set of spectral samples without reference values from a future time point is required for the calibration monitoring. The strategy particularly focuses on bias drift of the calibration model and spectral variability drift. In conjunction with the proposed strategy for model monitoring, a framework for model updating has been developed in which different categories of methods are mapped to the detected model drift. The performance of the proposed strategies is demonstrated on case studies from the agrofood industry involving samples acquired over an extensive time period.
|
|
Katrijn VAN DEUN
|
27-28 février 2024
|
|
Katrijn Van Deun is associate professor at the department of Methodology and Statistics, Tilburg University, and guest lecturer in data science at the Technical University of Eindhoven. She leads a research team on the development of statistical methods for big data in the social sciences, in particular high-dimensional multi-block data, that is funded by several research grants. She is co-editor of the Methodology journal.
|
Finding the hidden link: Statistical methods for multi-view high-dimensional data
|
Research in many disciplines relies more and more on intensive collections of data representing several point of views. For example, in studying obesity or depression as the outcome of environmental and genetic influences, researchers increasingly collect survey, dietary, biomarker and genetic data from the same individuals. Revealing the variables that are linked throughout these different types of data gives crucial insight in the complex interplay between the multiple factors that determine human behavior, e.g., the concerted action of genes and environment in the emergence of obesity or depression. Although linked high-dimensional multiview data form an extremely rich resource for research, extracting meaningful and integrated information is challenging and not appropriately addressed by current statistical methods. The challenge is to select those variables that are linked throughout the different blocks and this eludes current available methods for data analysis. A first problem is that relevant information is hidden in a bulk of irrelevant variables with a high risk of finding incidental associations. Second, the sources are often very heterogeneous, which may obscure apparent links between the shared mechanisms. In this presentation we will discuss the challenges associated to the analysis of large scale multiview data and present a sparse common and distinctive components approach to address the challenges.
|
|
Zouhair EL HADRI
|
27-28 février 2024
|
|
Zouhair El Hadri is full professor of mathematics at the faculty of sciences, Mohammed V University in Rabat, Morocco. He is member of Laboratoiry of Mathématics, Statistics and Applications. His research activities cover themes related to multivariate statistics, in particular in SEM with latent variables (SEM PLS, SEM CB), Path Analysis, Factor Analysis and Multiblock Data Analysis.
|
Structural Equation Modelling in chemometrics.
|
Structural Equation Modelling (SEM) (Bollen 1989) is a set of multivariate statistical methods elaborated to analyze and evaluate interactions and complex causal relationships between constructs (composites) called Latent Variables (LVs). Each LV is directly measured by a set of indicators on the same set of individuals. These indicators are known as Manifest Variables (MVs). SEM is an accumulation of developments in both Path Analysis (PA) models (Duncan 1966) and Factor Analysis (FA) models (Lawley 1940). The main objective of the talk is to study the potential of SEM in chimeometrics studies.
|
|
Age K. SMILDE
|
27-28 février 2024
|
|
Age K. Smilde is full professor of Biosystems Data Analysis at the Swammerdam Institute for Life Sciences at the University of Amsterdam and as of June 1, 2020, he is a part-time research professor at Simula Metropolitan Center for Digital Engineering, Oslo, Norway. He has published more than 300 peer-reviewed papers and wrote several books. He received several rewards among which the Eastern Analytical Reward and the Herman Wold Gold Medal. He was Editor-in-Chief of the Journal of Chemometrics and his research interest are machine learning and statistical methods for data fusion in the life sciences.
|
Relationships between multidimensional latent path models and common/distinct components in data fusion
|
Latent path models are used in diverse scientific fields and are starting to be used also in the life sciences. A saliant feature of such models is that they should be able to describe complex systems requiring therefore multiple latent variables per data block. Multidimensional extensions of unidimensional latent path models are not trivial and the complexities of some of such multidimensional extensions will be discussed. In the data fusion literature there is increasing attention to methods that can separate distinct (or unique) information of a block from common information between several blocks [1]. This framework of separating distinct, local and common information of several blocks of data related to the same substantive system appears to be very fruitful in exploring the relationships between those blocks and the specification of a multidimensional latent path model for the system under study. Preliminary results will be shown and many open questions remain. [1] Multiblock Data Fusion in Statistics and Machine Learning. Applications in the Natural and Life Sciences; Age K. Smilde, Tormod Naes and Kristian Hovde Liland, John Wiley & Sons, 2022.
|
|
Mohamed NADIF
|
27-28 février 2024
|
|
Mohamed Nadif is a full Professor in Machine learning at the university Paris Cité and part of the Centre Borelli CNRS UMR9010 at the same university. He leads the research activities of the "Intelligence Artificial for Data Science and Cybersecurity" team. He is also director of the master's degree "Machine Learning for Data Science" and teaches courses covering various topics, including machine learning methods, multivariate data analysis and Natural Language Processing (NLP). He is associate editor of the "Advances in Data Analysis and Classification" journal. His current research interests include cluster analysis, deep learning, NLP, missing data, representation learning, factorization and latent block models.
|
Multi-view clustering: models, algorithms and applications
|
Data clustering plays an indispensable role in data science. The methods/algorithms obtained from different approaches are useful for data collected from multiple sources or represented by multiple views, where each describes a perspective of the data. Thus, to deal with this kind of data in the context of unsupervised learning, we can rely on a factorization approach, a probabilistic approach or even deep neural networks. Indeed, as the success of clustering algorithms generally depends on data representation, and learning a good data representation is crucial for clustering algorithms, combining the two tasks is a common way of exploring this type of data. In this talk, we will review, discuss and illustrate the different approaches used, from the most classical to the most recent.
|
|
|