Project Details
Funding Scheme : General Research Fund
Project Number : 16210722
Project Title(English) : An Integrated Framework for Extracting and Utilizing Information from Data Visualizations in Digital Documents 
Project Title(Chinese) :  
Principal Investigator(English) : Dr Qu, Huamin 
Principal Investigator(Chinese) :  
Department : Dept of Computer Science & Engineering
Institution : The Hong Kong University of Science and Technology
Co - Investigator(s) :
Panel : Engineering
Subject Area :
Exercise Year : 2022 / 23
Fund Approved : 1,039,978
Project Status : On-going
Completion Date :
Abstract as per original application
Data visualizations, such as charts and graphs, are increasingly created and shared on the Web, appearing in articles, documents, scientific literature, and social media. Google Trends data show that the search interest in “chart” has been increasing rapidly in recent years, surpassing that of “image” in February 2020. Data visualizations convey vast amounts of quantitative information, from which people can easily interpret data trends and differences. Therefore, visualizations often serve as the main entry point for the general public to access data, especially in high-impact domains, such as politics, public health, and climate. However, unlike humans, machines have no direct access to the data inside visualizations. Massive numerical information and knowledge remain locked in visualizations and are inaccessible via search engines. Addressing this problem can make it possible to access and utilize numerical information in visualizations in similar ways to textual information in documents. The broad goal of the proposed project will be to make visualizations a “first-class citizen” on the Web and to develop techniques for automatically interpreting, retrieving, and analyzing visualizations at the Internet scale. If successful, the resulting tools arising from the proposed project will affect a wide range of visualization users, such as data scientists, journalists, and designers. We will first develop techniques for extracting data and visual encodings in visualizations. We will focus on SVG-based visualizations, which are becoming increasingly popular on the Web, and propose an end-to-end framework for translating SVG-based visualizations into visualization specifications with the underlying data. Next, we will propose methods for measuring the relationships, such as similarities and relevance, between visualizations. Such measurements will serve as building blocks for processing and analyzing visualization collections, such as grouping visualizations by their content. Finally, we will integrate the proposed techniques into an interactive interface for pilot users to analyze visualization collections using real-world datasets. We will conduct user studies to test the viability and effectiveness of the approach and propose at least two usage scenarios, namely search engines for visualizations and mining design knowledge for recommending visualizations. The core content of our proposed project will be disseminated in academic publications, and the dataset and codes will be made open source to benefit future research.
Research Outcome
Layman's Summary of
Completion Report:
Not yet submitted