ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details
Funding Scheme : Early Career Scheme
Project Number : 22200720
Project Title(English) : Trustworthy Deep Learning from Open-set Corrupted Data 
Project Title(Chinese) : 從開放式污損數據中進行可信賴深度學習 
Principal Investigator(English) : Prof Han, Bo 
Principal Investigator(Chinese) :  
Department : Department of Computer Science
Institution : Hong Kong Baptist University
E-mail Address : bhanml@hkbu.edu.hk 
Tel :  
Co - Investigator(s) :
Panel : Engineering
Subject Area : Computing Science & Information Technology
Exercise Year : 2020 / 21
Fund Approved : 484,288
Project Status : Completed
Completion Date : 28-2-2023
Project Objectives :
Academic task 1: Developing a dual-scored methodology to model open-set instance-dependent noisy labels robustly; Designing instance-level learning algorithms with theoretical guarantees to solve the proposed model.
Academic task 2: Exploiting generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples; Leveraging adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels; Investigating the Rademacher complexity of proposed approach for adversarially robust generalization.
Academic task 3: Designing an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain; Deriving the generalization bound for wildly unsupervised open-set domain adaptation.
Academic task 4: Automating and integrating above orthogonal techniques into an Automated Trustworthy Deep Learning (AutoTDL) system; Testing this system using real-world corrupted data (e.g., financial and healthcare data).
Measurable result 1: At least 12 high quality papers (3 per task) are planned for publication in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR, IEEE T-PAMI, MLJ and AIJ, and conferences, e.g., NeurIPS, ICML, ICLR, AAAI and IJCAI. The ECS candidate will collaborate with practitioners in finance and healthcare, and deploy this system into financial companies and public hospitals reliably.
Measurable result 2: The ECS candidate and researchers in this field will jointly organize one conference workshop and one special issue of a prestigious journal on trustworthy deep learning from open-set corrupted data. The candidate will gather the systematic findings and present the potential developments in a monograph for publication by Springer or Elsevier.
Measurable result 3: The project will have several research and teaching training outcomes, including graduated (PhD and MPhil) and undergraduate students. Graduated students play a vital role in Hong Kong research and local industry. The unique opportunities for research afforded by this project will give the students involved tremendous opportunity to develop trustworthy deep learning systems in Hong Kong, and become the backbone of infrastructure improvement in Hong Kong.
Abstract as per original application
(English/Chinese):

Realisation of objectives: For academic tasks 1-4, I have developed a dual-scored methodology to model open-set instance-dependent noisy labels robustly and designed instance-level learning algorithms with theoretical guarantees to solve the proposed model (see ICML’21 paper “Confidence Scores Make Instance-dependent Label-noise Learning Possible”). I have exploited generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples and leveraged adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels (see ICML’22 paper “Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin-of-Probability Attack”). I have designed an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain and derived the generalization bound for wildly unsupervised open-set domain adaptation (see NeurIPS’21 paper “TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation”). I have automated some trustworthy ML techniques above and tested them using real-world corrupted data (see TPAMI’24 paper “Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels”, minor revision). For measurable results 1-3, I have published 18 high quality papers in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR and IEEE TPAMI, and conferences, e.g., NeurIPS, ICML, ICLR, and AAAI. I have collaborated with practitioners in Alibaba advertisement team and deploy my trustworthy algorithms into Alibaba advertisement platform reliably. I and researchers in this field have jointly organize a series of conference workshops and several special issues on trustworthy deep learning from open-set corrupted data (https://bhanml.github.io/service.html). I have gathered the systematic findings and present the potential developments in two monographs for publication by MIT press and Springer Nature. The project has several research and teaching training outcomes, including PhD and undergraduate students. The unique opportunities for research afforded by this project have given the students involved tremendous opportunity to develop trustworthy deep learning algorithm and systems in Hong Kong.
Summary of objectives addressed:
Objectives Addressed Percentage achieved
1.Academic task 1: Developing a dual-scored methodology to model open-set instance-dependent noisy labels robustly; Designing instance-level learning algorithms with theoretical guarantees to solve the proposed model.Yes100%
2.Academic task 2: Exploiting generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples; Leveraging adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels; Investigating the Rademacher complexity of proposed approach for adversarially robust generalization.Yes100%
3.Academic task 3: Designing an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain; Deriving the generalization bound for wildly unsupervised open-set domain adaptation.Yes100%
4.Academic task 4: Automating and integrating above orthogonal techniques into an Automated Trustworthy Deep Learning (AutoTDL) system; Testing this system using real-world corrupted data (e.g., financial and healthcare data).Yes100%
5.Measurable result 1: At least 12 high quality papers (3 per task) are planned for publication in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR, IEEE T-PAMI, MLJ and AIJ, and conferences, e.g., NeurIPS, ICML, ICLR, AAAI and IJCAI. The ECS candidate will collaborate with practitioners in finance and healthcare, and deploy this system into financial companies and public hospitals reliably.Yes100%
6.Measurable result 2: The ECS candidate and researchers in this field will jointly organize one conference workshop and one special issue of a prestigious journal on trustworthy deep learning from open-set corrupted data. The candidate will gather the systematic findings and present the potential developments in a monograph for publication by Springer or Elsevier.Yes100%
7.Measurable result 3: The project will have several research and teaching training outcomes, including graduated (PhD and MPhil) and undergraduate students. Graduated students play a vital role in Hong Kong research and local industry. The unique opportunities for research afforded by this project will give the students involved tremendous opportunity to develop trustworthy deep learning systems in Hong Kong, and become the backbone of infrastructure improvement in Hong Kong.Yes100%
Research Outcome
Major findings and research outcome: I have achieved academic tasks and delivered measurable results supported by ECS grants. For academic tasks 1-4, I have developed a dual-scored methodology to model open-set instance-dependent noisy labels robustly and designed instance-level learning algorithms with theoretical guarantees to solve the proposed model (see ICML’21 paper “Confidence Scores Make Instance-dependent Label-noise Learning Possible”). I have exploited generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples and leveraged adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels (see ICML’22 paper “Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin-of-Probability Attack”). I have designed an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain and derived the generalization bound for wildly unsupervised open-set domain adaptation (see NeurIPS’21 paper “TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation”). I have automated some trustworthy ML techniques above and tested them using real-world corrupted data (see TPAMI’24 paper “Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels”, minor revision). For measurable results 1-3, I have published 18 high quality papers in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR and IEEE TPAMI, and conferences, e.g., NeurIPS, ICML, ICLR and AAAI. One of them have won NeurIPS’22 Outstanding Paper Award (see “Is Out-of-distribution Detection Learnable?”, https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/). I have collaborated with practitioners in Alibaba advertisement team and deploy my trustworthy algorithms into Alibaba advertisement platform reliably. I and researchers in this field have jointly organize a series of conference workshops and several special issues on trustworthy deep learning from open-set corrupted data (https://bhanml.github.io/service.html). I have gathered the systematic findings and present the potential developments in two monographs for publication by MIT press and Springer Nature (acceptance). The project has several research and teaching training outcomes, including PhD and undergraduate students. The unique opportunities for research afforded by this project have given the students involved tremendous opportunity to develop trustworthy deep learning algorithm and systems in Hong Kong. The detailed results can be found in project summary page: https://bhanml.github.io/research_ecs.html
Potential for further development of the research
and the proposed course of action:
I am pushing the research paradigm from “trustworthy deep learning” into “trustworthy foundation models”, and have output several new works (e.g., ICLR’24 spotlight papers). Meanwhile, I plan to leverage the lens of causality for trustworthy foundation models.
Layman's Summary of
Completion Report:
Trustworthy learning from corrupted data is a vital research topic in modern machine learning (i.e., deep learning), since most real-world data are easily imperfect and corrupted, such as financial data, healthcare data and social-network data. However, existing works in trustworthy deep learning (TDL) tend to implicitly assume that corrupted data should be closed set: samples with corrupted labels own true classes known in the training data; samples with the set of known classes can be crafted as adversarial examples in the testing phase; and samples in source domain share the same class of samples in target domain. Such closed-set assumption is the crux of existing TDL methods, which is too restrictive for many real-world applications. This project aims to address this conundrum by developing models, algorithms, and prototype system for trustworthy deep learning from open-set corrupted data. The outcome of the research could significantly robustify TDL techniques in open-world knowledge discovery and decision-making processes such as those in personalized medicine, financial engineering, and scientific discoveries.
Research Output
Peer-reviewed journal publication(s)
arising directly from this research project :
(* denotes the corresponding author)
Year of
Publication
Author(s) Title and Journal/Book Accessible from Institution Repository
2023 Jiangchao Yao, Bo Han, Zhihan Zhou, Ya Zhang*, Ivor W. Tsang  Latent Class-Conditional Noise Model  No 
Xiaobo Xia, Pengqian Lu, Chen Gong, Bo Han, Jun Yu*, Jun Yu, Tongliang Liu  Regularly Truncated M-estimators for Learning with Noisy Labels  No 
2022 Chen Gong, Qizhou Wang, Tongliang Liu, Bo Han, Jane You, Jian Yang*, Dacheng Tao  Instance-Dependent Positive and Unlabeled Learning with Labeling Bias Estimation  No 
2023 Chen Gong, Yongliang Ding, Bo Han, Gang Niu, Jian Yang*, Jane You, Dacheng Tao, Masashi Sugiyama  Class-Wise Denoising for Robust Learning under Label Noise  No 
2022 Quanming Yao, Yaqing Wang*, Bo Han, James Kwok  Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization  No 
2022 Songhua Wu, Tongliang Liu*, Bo Han, Jun Yu, Gang Niu, Masashi Sugiyama  Learning from Noisy Pairwise Similarity and Unlabeled Data  No 
Hansi Yang, Quanming Yao*, Bo Han, James T. Kwok  Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels  No 
Recognized international conference(s)
in which paper(s) related to this research
project was/were delivered :
Month/Year/City Title Conference Name
Online SIGUA: Forgetting May Make Learning with Noisy Labels More Robust  ICML 
Online Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model  AAAI 
Online Learning with Group Noise  AAAI 
Online Confidence Scores Make Instance-dependent Label-noise Learning Possible  ICML 
Online Maximum Mean Discrepancy Test is Aware of Adversarial Attacks  ICML 
Online Learning Diverse-Structured Networks for Adversarial Robustness  ICML 
Baltimore Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack  ICML 
New Orleans Is Out-of-Distribution Detection Learnable?  NeurIPS 
Kigali Rwanda A Holistic View of Label Noise Transition Matrix in Deep Learning and Beyond  ICLR 
Kigali Rwanda Combating Exacerbated Heterogeneity for Robust Models in Federated Learning  ICLR 
New Orleans Watermarking for Out-of-distribution Detection  NeurIPS 
Other impact
(e.g. award of patents or prizes,
collaboration with other research institutions,
technology transfer, etc.):
Realisation of the education plan:

  SCREEN ID: SCRRM00542