ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details

Funding Scheme :

General Research Fund

Project Number :

17200817

Project Title(English) :

Spectral Analysis and Optimization Problems in Directed Hypergraph Diffusion Processes

Project Title(Chinese) :

有向超圖擴散過程的譜分析和優化問題

Principal Investigator(English) :

Dr Chan, Hubert Tsz Hong

Principal Investigator(Chinese) :

Department :

School of Computing and Data Science

Institution :

The University of Hong Kong

E-mail Address :

hubert@cs.hku.hk

Tel :

28578461

Co - Investigator(s) :

Panel :

Engineering

Subject Area :

Computing Science & Information Technology

Exercise Year :

2017 / 18

Fund Approved :

700,000

Project Status :

Completed

Completion Date :

31-12-2021

Project Objectives :

To investigate the mathematical properties of more general diffusion models. For instance, how the Laplacian of directed hypergraphs can capture the underlying combinatorial properties.

To develop related approximation and optimization algorithms. For example, the quadratic form is minimized for approximating the Laplacian eigenvalues or predicting labels in semi-supervised learning.

To employ our framework for practical applications. We will test our models and optimization algorithms on real-world datasets, and compare them with existing methods.

Abstract as per original application
(English/Chinese):

This project will investigate diffusion processes on hypergraphs, in which each vertex has some measure and an arbitrary number of vertices can be contained in a hyperedge. Among vertices in the same hyperedge, measure will flow from vertices with maximum measure to those with minimum measure, at a rate proportional to the difference in measure. We will study new diffusion models for more general classes of hypergraphs, which will not only bring theoretical advances in spectral (hyper)graph theory and convex optimization, but also make practical impacts on applications such as semi-supervised learning and market modeling. Classical spectral graph theory relates the edge expansion properties of a normal graph, such as convergence rates of random walks with the spectral properties of its corresponding Laplacian operator. Our preliminary work completes the framework in a recent groundbreaking STOC 2015 paper that studies a diffusion process on (undirected) hypergraphs, from which a hypergraph Laplacian can be defined. Analogous to the classical results, the spectral properties of a Laplacian can be related to hypergraph edge expansion and the convergence rate of the diffusion process. We will explore diffusion processes for more general models such as directed hypergraphs, in which each hyperedge consists of a tail subset of vertices pointing to a head subset of vertices. A directed hyperedge can therefore capture a causal relationship between multiple agents. For instance, in a coauthorship network , each hyperedge contains the coauthors of a paper and its direction models how senior coauthors can influence the research direction of junior coauthors. With our recent advances in hypergraph spectral techniques, we are in an excellent position to pursue the following directions. (1) Investigate the mathematical properties of more general diffusion models. For instance, a formal framework on directed hypergraph diffusion process will shed light on how the directed version of the Laplacian can capture the underlying combinatorial properties. (2) Develop related optimization and approximation algorithms. A hypergraph induces a convex quadratic form, which is the objective to be minimized in mathematical programs for approximating eigenvalues of the Laplacian operator or assigning labels in semi-supervised learning. (3) Employ our framework for practical applications. We will test our models and optimization algorithms on real-world datasets, and compare them with existing methods.

本項目將對超圖（hypergraph）中的擴散過程進行研究，其中每個頂點有權值，且超邊（hyperedge）可以包含任意數量的頂點。在包含於同一個超邊的頂點中，權值將從最大權頂點流向最小權頂點，其速率與權值的差值成正比。我們將研究更通用的超圖擴散模型。這不僅將在譜（超）圖論和凸優化中帶來理論進展，而且還將對半監督學習和市場建模等應用產生實際影響。經典譜圖論把圖的邊擴展性，例如隨機遊走的收斂速度，與拉普拉斯算子的譜特性相關聯。我們的初步研究完善了一篇開創性的STOC 2015論文。該論文研究了（無向）超圖上的擴散過程並定義了超圖拉普拉斯算子。類似於經典結果，拉普拉斯算子的譜特性與超圖的邊擴展性以及擴散過程的收斂速度有關。我們將探索更通用的模型（例如有向超圖）的擴散過程，其中有向超邊由頂點的頭子集指向頂點的尾子集組成。因此，有向超邊可以表達多個元素之間的因果影響關係。例如，在共同作者網絡中，每個有向超邊包含論文的作者們，而超邊的方向表達了資深作者如何影響初級作者的研究方向。憑藉我們在超圖譜方面的最新進展，我們非常適合探求以下方向。（1）研究更通用的擴散模型的數學性質。例如，關於有向超圖擴散過程的理論框架將揭示有向拉普拉斯算子如何刻劃超圖的潛在組合屬性。（2）開發相關的優化和近似算法。超圖的凸二次公式作為數學規劃問題中的最小化目標被用於近似拉普拉斯算符的特徵值以及半監督學習中的標籤分配問題。（3）在實際中應用我們的框架。我們將用現實世界的數據集對我們的模型和優化算法進行測試，並將其與現有方法進行比較。

Realisation of objectives:

Objective 1 In a previous work [STOC 2015: Louis], a continuous diffusion process was considered on hypergraphs to define a Laplacian operator, whose spectral properties satisfy the celebrated Cheeger's inequality. However, one peculiar aspect of this diffusion process is that each hyperedge directs flow only from vertices with the maximum density to those with the minimum density, while ignoring vertices having strict in-between densities. The previous work [STOC 2015] has overlooked that in order for the diffusion process to be well-defined, the flow has to be determined by a hypergraph densest subset problem. As a first step, we have fixed this issue and established the relationship between the hypergraph Laplacian and densest subset problem. The result has appeared in JACM 2018. We have generalized the diffusion process, in which vertices in a hyperedge can act as mediators to receive flow from vertices with maximum density and deliver flow to those with minimum density. We show that the resulting Laplacian operator still has a second eigenvalue satisfying the Cheeger's inequality. Our generalized diffusion model shows that there is a family of operators whose spectral properties are related to hypergraph conductance and provides a powerful tool to enhance the development of spectral hypergraph theory. Moreover, since every vertex can participate in the new diffusion model at every instant, this can potentially have wider practical applications. This result has appeared in COCOON 2018 and TCS 2020. In addition to mediators as mentioned above, we have refined and developed a diffusion operator on a directed hypergraph in which some vertices are stationary. This unified framework is general enough for the following two applications: (i) Cheeger’s inequality for directed hyperedge expansion, and (ii) quadratic optimization with stationary vertices in the context of semi-supervised learning. Despite the crucial role of the diffusion process in spectral analysis, previous works have not formally established the existence of the corresponding diffusion processes. We have given a proof framework that can indeed show that such diffusion processes are well-defined. In the first application, we use the spectral properties of the diffusion operator to achieve the Cheeger’s inequality for directed hyperedge expansion. In the second application, the diffusion operator can be interpreted as giving a continuous analog to the subgradient method, which moves the feasible solution in discrete steps towards an optimal solution. The result has appeared in TCS 2020. Objective 2 We have designed approximation algorithms for the edge expansion and sparsest cut with product demands problems on directed hypergraphs, which subsume previous graph models such as undirected hypergraphs and directed normal graphs. Using an SDP formulation adapted to directed hypergraphs, we apply the SDP primal-dual framework by Arora and Kale (JACM 2016) to design polynomial-time algorithms whose approximation ratios match those of algorithms previously designed for more restricted graph models. Moreover, we have deconstructed their framework and simplified the notation to give a much cleaner presentation of the algorithms. This result has appeared in COCOON 2018. As aforementioned, the diffusion operator can be interpreted as giving a continuous analog to the subgradient method, which moves the feasible solution in discrete steps towards an optimal solution. We have used this intuition to consider semi-supervised learning on hypergraphs, where our method uses a convex program whose objective function is not everywhere differentiable. Based on the subgradient method, we have given a simple approach for solving the convex program. The results have appeared in TKDE 2020. The aforementioned subgradient method is based on a convex potential function. We have extended our work to consider an opinion diffusion process in which the objective function may not be convex with respect to the input parameters. Specifically, we have revisited the opinion susceptibility problem, in which agents influence one another’s opinions through an iterative process. Each agent has some fixed innate opinion. In each step, the opinion of an agent is updated to some convex combination between its innate opinion and the weighted average of its neighbors’ opinions in the previous step. The resistance of an agent measures the importance it places on its innate opinion in the above convex combination. The goal is to select the resistance of each agent (from some given range) such that the sum of the equilibrium opinions is minimized. Even though the objective function is neither convex nor concave, we have still analyzed the structure of the objective function and developed a local search algorithm that solves the problem optimally. The result has appeared in WWW 2019. Objective 3 We have also evaluated our methods using real-world datasets. For the semi-supervised learning problem on hypergraphs, our experiments on real-world datasets confirm that our confidence interval approach on hypergraphs outperforms existing methods, and our subgradient method gives faster running times when the number of vertices is much larger than the number of edges. Our experiments also support that using directed hypergraphs to capture causal relationships can improve the prediction accuracy. Furthermore, our model can be readily extended to capture multiclass learning. For opinion susceptibility problem, to investigate the practical impacts of our approaches, we have combined the iterative process and the local search paradigm to design very efficient algorithms that can solve the problem optimally on large-scale graphs containing millions of nodes. As we have established the relationship between hypergraph Laplacians and densest subset problem, we have developed practical algorithms for the k-clique densest subset problem that can be viewed as a special case of the densest subset problem in hypergraphs, which also has applications in community detection. Our algorithms can handle graphs with millions of nodes and more than a billion edges. The result has appeared in VLDB 2020. We have also considered practical fully dynamic algorithms for various clustering and partitioning problems. For the approximate maintenance of k-core decomposition, our algorithms can efficiently run on hypergraphs with millions of nodes and edges; the result has appeared in TKDD 2020. For maintaining an approximate k-center clustering, we have developed a low-memory variant that can handle millions of updates efficiently; the result has appeared in TKDE 2022.

Summary of objectives addressed:

	Objectives	Addressed	Percentage achieved
1.	To investigate the mathematical properties of more general diffusion models. For instance, how the Laplacian of directed hypergraphs can capture the underlying combinatorial properties.	Yes	100%
2.	To develop related approximation and optimization algorithms. For example, the quadratic form is minimized for approximating the Laplacian eigenvalues or predicting labels in semi-supervised learning.	Yes	100%
3.	To employ our framework for practical applications. We will test our models and optimization algorithms on real-world datasets, and compare them with existing methods.	Yes	100%

Research Outcome

Major findings and research outcome:

For Objective 1, we have laid down a solid theoretical foundation that formalizes a continuous diffusion process in hypergraphs and defines a Laplacian operator via a densest subset problem, where the results appeared in JACM 2018. Moreover, we have generalized the diffusion process such that mediators can divert flow between vertices with maximum and minimum densities, where the results have appeared in COCOON 2018 and TCS 2020. Furthermore, we have extended the framework to directed hypergraphs, which has appeared in TCS 2020. For Objective 2, we have designed approximation algorithms for the edge expansion and sparsest cut with product demands problems on directed hypergraphs, where the result has appeared in COCOON 2018. Moreover, interpreting the diffusion operator as a continuous analog to the subgradient method, we have designed semi-supervised learning algorithms on hypergraphs, where the results have appeared in TKDE 2020. Moreover, we have studied the opinion susceptibility problem, in which agents influence one another’s opinions through an iterative process. Even though the underlying objective function is neither convex nor concave, we have developed a local search algorithm that solves the problem optimally. The result has appeared in WWW 2019. For Objective 3, both the aforementioned TKDE 2020 and WWW 2019 papers contain experiments on real-world datasets, which show that our methods are scalable to graphs with millions of nodes. Moreover, we have considered practical algorithms for the k-clique densest subset problem that can be viewed as a special case of the densest subset problem in hypergraphs. Our algorithms can handle graphs with millions of nodes and more than a billion edges. The result has appeared in VLDB 2020. Furthermore, we have also considered practical fully dynamic algorithms for various clustering and partitioning problems. For the approximate maintenance of k-core decomposition, our results has appeared in TKDD 2020. For maintaining an approximate k-center clustering, we have developed a low-memory variant that has appeared in TKDE 2022.

Potential for further development of the research
and the proposed course of action:

After this project has developed the theoretical foundation for spectral hypergraph theory that recover the basic Cheeger inequality, the next question can be whether higher order eigenvalues and eigenvectors have any significance for multi-way hypergraph partitions. Moreover, recently hypergraph cuts have been generalized to submodular functions, and it will be interesting to explore if the higher order spectral properties can also be relevant to these generalized functions. Furthermore, as cut and spectral sparsifiers have been investigated to preserve the cut or the quadratic form by a subgraph with few number of edges, it will also be interesting if the sparsifiers for submodular cut functions have any connections with the spectral properties of the hypergraphs. Finally, another interesting direction is to introduce noise in the hypergraph diffusion process. This can be relevant to applications where the communication channels are noisy, or vertices would like to protect the privacy of their sensitive data by introducing noise in the communication. Spectral techniques can be further developed to analyze the convergence behavior of such processes.

Layman's Summary of
Completion Report:

In this project, we have investigated various diffusion processes on hypergraphs, which have brought significant theoretical advances in spectral hypergraph theory and convex optimization and made practical impacts on applications such as semi-supervised learning. We have generalized the model to directed hypergraphs, which can capture causal relationship between multiple agents. In particular, we have used the diffusion processes to define Laplacian operators, which we have used to develop approximation algorithms for theoretical problems such as sparsest cut in hypergraphs and subgradient methods for practical problems such as semi-supervised learning on hypergraphs. Moreover, we have utilized our framework to practical application scenarios such as social network analysis and community detection. We have tested our models and optimization algorithms on large real-world datasets containing millions of nodes. This project has made significant impacts on not only theoretical research but also other areas in the wider computer science community such as machine learning and data mining.

Research Output

Peer-reviewed journal publication(s)
arising directly from this research project :
(* denotes the corresponding author)

Year of Publication	Author(s)	Title and Journal/Book	Accessible from Institution Repository
2018	T.-H. Hubert Chan*, Anand Louis, Zhihao Gavin Tang, Chenzi Zhang	Spectral Properties of Hypergraph Laplacian and Approximation Algorithms	No
2019	T.-H. Hubert Chan*, Zhihao Gavin Tang, Xiaowei Wu, Chenzi Zhang	Diffusion operator and spectral analysis for directed hypergraph Laplacian	No
2020	T.-H. Hubert Chan*, Zhibin Liang	Generalizing the hypergraph Laplacian via a diffusion process with mediators	No
2020	Chenzi Zhang, Shuguang Hu, Zhihao Gavin Tang, T.-H. Hubert Chan*	Re-Revisiting Learning on Hypergraphs: Confidence Interval, Subgradient Method, and Extension to Multiclass	No
2020	Bintao Sun, Maximilien Danisch, T.-H. Hubert Chan, Mauro Sozio*	KClist++: a simple algorithm for finding k-clique densest subgraphs in large graphs	No
2020	Bintao Sun, T.-H. Hubert Chan*, Mauro Sozio	Fully Dynamic Approximate k-Core Decomposition in Hypergraphs	No
2022	T.-H. Hubert Chan*, Arnaud Guerquin, Shuguang Hu, Mauro Sozio	Fully Dynamic k-Center Clustering With Improved Memory Efficiency	No

Recognized international conference(s)
in which paper(s) related to this research
project was/were delivered :

Month/Year/City	Title	Conference Name
Qingdao	Generalizing the Hypergraph Laplacian via a Diffusion Process with Mediators	Computing and Combinatorics - 24th International Conference, COCOON
Qingdao	SDP Primal-Dual Approximation Algorithms for Directed Hypergraph Expansion and Sparsest Cut with Product Demands	Computing and Combinatorics - 24th International Conference, COCOON
San Francisco	Revisiting Opinion Dynamics with Varying Susceptibility to Persuasion via Non-Convex Local Search	The World Wide Web Conference

Other impact
(e.g. award of patents or prizes,
collaboration with other research institutions,
technology transfer, etc.):

SCREEN ID: SCRRM00542