![]() |
|
ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC |
| Project Details |
| Funding Scheme : | Early Career Scheme | ||||||||||||||||||||||||||||||||||||
| Project Number : | 22200720 | ||||||||||||||||||||||||||||||||||||
| Project Title(English) : | Trustworthy Deep Learning from Open-set Corrupted Data | ||||||||||||||||||||||||||||||||||||
| Project Title(Chinese) : | 從開放式污損數據中進行可信賴深度學習 | ||||||||||||||||||||||||||||||||||||
| Principal Investigator(English) : | Prof Han, Bo | ||||||||||||||||||||||||||||||||||||
| Principal Investigator(Chinese) : | |||||||||||||||||||||||||||||||||||||
| Department : | Department of Computer Science | ||||||||||||||||||||||||||||||||||||
| Institution : | Hong Kong Baptist University | ||||||||||||||||||||||||||||||||||||
| E-mail Address : | bhanml@hkbu.edu.hk | ||||||||||||||||||||||||||||||||||||
| Tel : | |||||||||||||||||||||||||||||||||||||
| Co - Investigator(s) : |
|
||||||||||||||||||||||||||||||||||||
| Panel : | Engineering | ||||||||||||||||||||||||||||||||||||
| Subject Area : | Computing Science & Information Technology | ||||||||||||||||||||||||||||||||||||
| Exercise Year : | 2020 / 21 | ||||||||||||||||||||||||||||||||||||
| Fund Approved : | 484,288 | ||||||||||||||||||||||||||||||||||||
| Project Status : | Completed | ||||||||||||||||||||||||||||||||||||
| Completion Date : | 28-2-2023 | ||||||||||||||||||||||||||||||||||||
| Project Objectives : |
|
||||||||||||||||||||||||||||||||||||
| Abstract as per original application (English/Chinese): |
|
||||||||||||||||||||||||||||||||||||
| Realisation of objectives: | For academic tasks 1-4, I have developed a dual-scored methodology to model open-set instance-dependent noisy labels robustly and designed instance-level learning algorithms with theoretical guarantees to solve the proposed model (see ICML’21 paper “Confidence Scores Make Instance-dependent Label-noise Learning Possible”). I have exploited generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples and leveraged adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels (see ICML’22 paper “Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin-of-Probability Attack”). I have designed an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain and derived the generalization bound for wildly unsupervised open-set domain adaptation (see NeurIPS’21 paper “TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation”). I have automated some trustworthy ML techniques above and tested them using real-world corrupted data (see TPAMI’24 paper “Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels”, minor revision). For measurable results 1-3, I have published 18 high quality papers in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR and IEEE TPAMI, and conferences, e.g., NeurIPS, ICML, ICLR, and AAAI. I have collaborated with practitioners in Alibaba advertisement team and deploy my trustworthy algorithms into Alibaba advertisement platform reliably. I and researchers in this field have jointly organize a series of conference workshops and several special issues on trustworthy deep learning from open-set corrupted data (https://bhanml.github.io/service.html). I have gathered the systematic findings and present the potential developments in two monographs for publication by MIT press and Springer Nature. The project has several research and teaching training outcomes, including PhD and undergraduate students. The unique opportunities for research afforded by this project have given the students involved tremendous opportunity to develop trustworthy deep learning algorithm and systems in Hong Kong. | ||||||||||||||||||||||||||||||||||||
| Summary of objectives addressed: |
|
||||||||||||||||||||||||||||||||||||
| Research Outcome | |||||||||||||||||||||||||||||||||||||
| Major findings and research outcome: | I have achieved academic tasks and delivered measurable results supported by ECS grants. For academic tasks 1-4, I have developed a dual-scored methodology to model open-set instance-dependent noisy labels robustly and designed instance-level learning algorithms with theoretical guarantees to solve the proposed model (see ICML’21 paper “Confidence Scores Make Instance-dependent Label-noise Learning Possible”). I have exploited generalized unlabeled data as auxiliary medium to robustly handle open-set adversarial examples and leveraged adversarial robust loss jointly train on original training set and unlabeled data with pseudo-labels (see ICML’22 paper “Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin-of-Probability Attack”). I have designed an adversarial dual checking methodology to robustly adapt from corrupted source domain to open-set unlabeled target domain and derived the generalization bound for wildly unsupervised open-set domain adaptation (see NeurIPS’21 paper “TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation”). I have automated some trustworthy ML techniques above and tested them using real-world corrupted data (see TPAMI’24 paper “Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels”, minor revision). For measurable results 1-3, I have published 18 high quality papers in top-tier machine learning, deep learning and artificial intelligence journals, e.g., JMLR and IEEE TPAMI, and conferences, e.g., NeurIPS, ICML, ICLR and AAAI. One of them have won NeurIPS’22 Outstanding Paper Award (see “Is Out-of-distribution Detection Learnable?”, https://blog.neurips.cc/2022/11/21/announcing-the-neurips-2022-awards/). I have collaborated with practitioners in Alibaba advertisement team and deploy my trustworthy algorithms into Alibaba advertisement platform reliably. I and researchers in this field have jointly organize a series of conference workshops and several special issues on trustworthy deep learning from open-set corrupted data (https://bhanml.github.io/service.html). I have gathered the systematic findings and present the potential developments in two monographs for publication by MIT press and Springer Nature (acceptance). The project has several research and teaching training outcomes, including PhD and undergraduate students. The unique opportunities for research afforded by this project have given the students involved tremendous opportunity to develop trustworthy deep learning algorithm and systems in Hong Kong. The detailed results can be found in project summary page: https://bhanml.github.io/research_ecs.html | ||||||||||||||||||||||||||||||||||||
| Potential for further development of the research and the proposed course of action: |
I am pushing the research paradigm from “trustworthy deep learning” into “trustworthy foundation models”, and have output several new works (e.g., ICLR’24 spotlight papers). Meanwhile, I plan to leverage the lens of causality for trustworthy foundation models. | ||||||||||||||||||||||||||||||||||||
| Layman's Summary of Completion Report: | Trustworthy learning from corrupted data is a vital research topic in modern machine learning (i.e., deep learning), since most real-world data are easily imperfect and corrupted, such as financial data, healthcare data and social-network data. However, existing works in trustworthy deep learning (TDL) tend to implicitly assume that corrupted data should be closed set: samples with corrupted labels own true classes known in the training data; samples with the set of known classes can be crafted as adversarial examples in the testing phase; and samples in source domain share the same class of samples in target domain. Such closed-set assumption is the crux of existing TDL methods, which is too restrictive for many real-world applications. This project aims to address this conundrum by developing models, algorithms, and prototype system for trustworthy deep learning from open-set corrupted data. The outcome of the research could significantly robustify TDL techniques in open-world knowledge discovery and decision-making processes such as those in personalized medicine, financial engineering, and scientific discoveries. | ||||||||||||||||||||||||||||||||||||
| Research Output | |||||||||||||||||||||||||||||||||||||
| Peer-reviewed journal publication(s) arising directly from this research project : (* denotes the corresponding author) |
|
||||||||||||||||||||||||||||||||||||
| Recognized international conference(s) in which paper(s) related to this research project was/were delivered : |
|
||||||||||||||||||||||||||||||||||||
| Other impact (e.g. award of patents or prizes, collaboration with other research institutions, technology transfer, etc.): |
|||||||||||||||||||||||||||||||||||||
| Realisation of the education plan: | |||||||||||||||||||||||||||||||||||||
| SCREEN ID: SCRRM00542 |