ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details
Funding Scheme : General Research Fund
Project Number : 12200725
Project Title(English) : Towards Dynamic Knowledge-aware Federated Learning for Foundation Models 
Project Title(Chinese) : 面向基礎模型的動態知識感知聯邦學習 
Principal Investigator(English) : Prof Han, Bo 
Principal Investigator(Chinese) :  
Department : Department of Computer Science
Institution : Hong Kong Baptist University
Co - Investigator(s) :
Prof Cheung, Yiu-ming
Prof Zhang, Chengqi
Panel : Engineering
Subject Area : Computing Science & Information Technology
Exercise Year : 2025 / 26
Fund Approved : 854,554
Project Status : On-going
Completion Date : 31-8-2028
Abstract as per original application
(English/Chinese):
Foundation Models (FMs) have become pivotal in advancing artificial intelligence, offering a flexible framework for developing various industry sectors. Take an example of healthcare, which costs HK$243.2 billion (8.5% of GDP) for Hong Kong in 2021-22, applying FMs can improve the efficiency of healthcare systems, e.g., providing informative assistance on analyzing patient records. However, training FMs usually relies on using a central server to aggregate massive amounts of data, thus increasing the risk of privacy leakage and data monopolization. In this context, Federated Learning (FL) emerges as a promising approach that can collaboratively train FMs, and embody privacy protection by enabling clients to contribute to FMs training without sending their data. Recent advancements, e.g., TogetherAI (https://www.together.ai/), have shown promising outcomes in this field. However, direct applications of existing FL framework to train FMs are impeded due to below challenges: Challenge 1: How to align dynamic data requirements of FMs with static data assumptions of FL? FMs necessitate continuous updates, e.g., clients often receive fresh data to keep knowledge of the model. However, existing FL paradigms neglect the challenge of aligning dynamic data requirements of FMs with static data assumptions of FL. Challenge 2: How to deal with the data leakage when pre-training FMs in an FL manner? FL fundamentally relies on gradient exchange. However, exchanged gradients can be exploited by adversaries to infer sensitive information, leading to the risk of data leakage. Challenge 3: How to deal with imperfect data when fine-tuning FMs in an FL scheme? Data quality plays a crucial role in fine-tuning FMs. However, the data collected by clients may be noisy or even maliciously altered, posing threats to FM performance. In this project, we aim to develop a new FL paradigm for FMs in dynamic knowledge environments. Specifically, we proposed four tasks to research on the above three challenges. Task 1: Federated Foundation Model Learning with Diverse Knowledge Availability: it will develop a novel paradigm called FedKoLa for federated FM learning with dynamic knowledge (for Challenge 1). Task 2: Federated Foundation Model Pre-training under An Honest-but-Curious Eavesdropping Adversary: it will develop federated FM pre-training algorithms under honest-but-curious adversaries (for Challenge 2). Task 3: Federated Foundation Model Fine-tuning with Imperfect Data: it will develop knowledge updating algorithms for federated FM fine-tuning against imperfect data (for challenge 3). Task 4: Performance Evaluation and Prototype System Development: our models will be strictly evaluated and integrated into the prototype system.
基礎模型已成為推動人工智能發展的關鍵,為各行各業的發展提供了一個靈活的框架。以香港2021-22年度醫療保健支出2432億港元(佔GDP的8.5%)為例,應用基礎模型可以提升醫療保健系統的效率,例如,在分析病歷時提供信息輔助。然而,訓練基礎模型通常依賴使用中央服務器來聚合大量數據,從而增加了隱私外洩和數據壟斷的風險。在此背景下,聯邦學習應運而生,并成為一種極具潛力的方法。它可以聯合地訓練基礎模型,並透過允許客戶在不提交自身數據的情況下參與基礎模型的訓練來體現隱私保護。 近期進展,例如 TogetherAI (https://www.together.ai/),已在該領域展現出良好的成果。然而直接應用現有聯邦學習框架訓練基礎模型受到阻礙且面臨以下挑戰:挑戰 1:如何將基礎模型的動態數據需求與聯邦學習的靜態數據假設對齊?基礎模型需要持續更新,例如,用戶端經常接收新數據以保持對模型的更新。然而,現有的聯邦學習範式忽略了將基礎模型的動態數據需求與聯邦學習的靜態數據假設對齊的挑戰。挑戰 2:以聯邦學習方式預訓練基礎模型時,如何處理數據洩漏?聯邦學習根本上依賴梯度交換。然而,交換的梯度可能被攻擊者利用來推斷敏感訊息,從而導致數據洩漏的風險。挑戰 3:在聯邦學習方案中微調基礎模型時,如何處理非完備數據?數據質量在微調基礎模型中起著至關重要的作用。然而,客戶端收集的數據可能存在噪聲,甚至可能被惡意篡改,從而對基礎模型性能構成威脅。在本項目中,我們旨在為動態知識環境下的基礎模型訓練開發一種新的聯邦學習範式。 具體來說,我們針對上述三個挑戰提出了四項研究任務。任務 1:具有多樣化知識可用性的聯邦基礎模型學習。我們將開發一種名為 FedKoLa 的新範式,用於具有動態知識的聯邦基礎模型學習(針對挑戰 1)。任務 2:在誠實但好奇的竊聽對手下進行聯邦基礎模型預訓練。我們將開發在誠實但好奇的對手下進行的聯邦基礎模型預訓練算法(針對挑戰 2)。任務 3:使用非完備數據的聯邦基礎模型微調。我們將開發用於針對非完備數據的聯邦基礎模型微調的知識更新算法(針對挑戰 3)。任務 4:性能評估和原型系統開發。我們的模型將經過嚴格評估並整合到原型系統中。
Research Outcome
Layman's Summary of
Completion Report:
Not yet submitted

  SCREEN ID: SCRRM00542