|
Publications |
( ___: equal contribution, *: corresponding author)
We propose a simple (10 lines of codes), efficient (through a single pass over examples of a target task), yet effective (on 26 pre-trained models and 16 downstream tasks) transferability measure named TransRate for fine-tuning pre-trained models. TransRate measures the mutual information between features of target examples by a pre-trained model and labels of them, which we estimate using coding rate.
The Role of Deconfounding in Meta-learning
Yinjie Jiang,
Zhengyu Chen,
Kun Kuang*, Luotian Yuan, Xinhai Ye, Zhihua Wang,
Fei Wu*,
Ying Wei*
39th International Conference on Machine Learning (ICML), 2022
pdf
This work offers a novel causal perspective of meta-learning, through which we explain the memorization effect as a confounder and frame previous anti-memorization methods as different deconfounder approaches. Derived from the causal inference principle of front-door adjustment, we propose two frustratingly easy but effective deconfounder algorithms.
This work pursues an adaptive task scheduler for meta-learning tasks that are likely detrimental with noise or imbalanced given a limited number of meta-training tasks. We for the first time design a neural scheduler to decide which meta-training tasks to use next and train the scheduler to optimize the generalization capacity of the meta-knowledge to unseen tasks. We have shown that such a scheduler theoretically improves the optimization landscape and empirically outshines conventional schedulers (including the commonly adopted random sampling).
This paper seeks to remedy the lack of labeled compounds with activities (ADMET properties) in virtual screening (lead optimization) of drug by transferring the knowledge from previous assays, namely in-vivo experiments, collected by different laboratories and against various target proteins. We propose a functionally regionalized meta-learning algorithm, with the architectural compositional capability, to accommodate wildly different assays and meantime capture the relationship between assays.
MetaTS: Meta Teacher-Student Network for Multilingual SequenceLabeling with Minimal Supervision
Zheng Li,
Danqing Zhang,
Tianyu Cao,
Ying Wei,
Yiwei Song,
Bing Yin
2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
pdf
We explore multilingual sequence labeling with a single unified model for multiple languages and minimal supervision. Specifically, we resort to the teacher-student framework to leverage large multilingual unlabeled data. We propose a meta teacher-student (MetaTS) network that allows the teacher to dynamically adapt its pseudo-annotation strategies by the student's feedback on the generated pseudo-labeled data of each language.
Meta-learning Hyperparameter Performance Prediction with Neural Processes
Ying Wei,
Peilin Zhao,
Junzhou Huang
38th International Conference on Machine Learning (ICML), 2021
pdf /
code
We transfer knowledge from historical hyperparameter optimization (HPO) trials on other datasets to speed up HPO of a huge dataset where even a single trial is costly. The proposed meta-learning algorithm first introduces neural processes (NPs) as a surrogate model which empowers the simultaneous transfer of trial observations, parameters of NPs, and initial hyperparameter configurations.
This work addresses the meta-overfitting problem. We solve the problem by augmenting as many tasks as possible. Concretely, we propose the two criteria for valid task augmentation and also two task augmentation methods that satisfy the criteria. Theoretical studies and empirical results both demonstrate that the proposed task augmentation strategies significantly mitigate the meta-overfitting. Also, the task augmentation strategies remain compatible with any advanced meta-learning algorithms.
Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages
Zheng Li, Mukul Kumar, William Headden, Bing Yin, Ying Wei,
Yu Zhang,
Qiang Yang
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
pdf
This work focuses on the problem of cross-lingual transfer (CLT). For each CLT task, we formulate the transfer process as information propagation over a dynamic graph. More importantly, we improve the transfer effectiveness by extracting meta-knowledge such as propagation strategies from previous CLT experiences.
Self-Supervised Graph Transformer on Large-Scale Molecular Data
Yu Rong,
Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei,
Wenbing Huang,
Junzhou Huang
34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020
pdf
We propose a novel framework, GROVER, for effective molecular representation which is a crucial prerequisite in AI-driven drug design and discovery. GROVER learns to characterize molecules with rich and semantic features from enormous unlabeled molecular data, with carefully designed self-supervised tasks in node-, edge-, and graph-level. Besides, GROVER itself is more expressive, integrating Message Passing Networks into the Transformer-style architecture.
Adversarial Sparse Transformer for Time Series Forecasting
Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao,
Ying Wei, Junzhou Huang
34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020
pdf
Existing time series forecasting methods fail to either capture stochasticity of data or forecast for a long time horizon due to error accumulation. In this work, we are motivated to address the two issues with a novel time series forecasting model. The model, Adversarial Sparse Transformer (AST), based on GAN, adopts a sparse Transformer as the generator to learn a sparse attention map for forecasting and meanwhile takes a discriminator to improve the prediction performance at a sequence level.
We are strongly motivated to improve unsupervised domain adaptation for medical image diagnosis, from the perspectives of denoising noisy annotations due to limited expertise and differentiating the adaptation difficulty of images that have significant discrepancies. The proposed, harnessing the collective intelligence of two peer networks, achieves the goals via a noise co-adaptation layer and a transferability-aware weight for each image.
TranSlider: Transfer Ensemble Learning from Exploitation to Exploration
Kuo Zhong,
Ying Wei*,
Chun Yuan,
Haoli Bai,
Junzhou Huang
Twenty-sixth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020
pdf
Learning a strategy dictating what and where to transfer is key to avoid negative transfer, while the strategy always suffers from overfitting in light of limited annotations in a target domain. For the first time, we propose transfer ensemble learning to solve the problem. We propose to generate a spectrum of models in decreasing transferability, ranging from pure exploitation of the source model to unconstrained exploration for the target domain.
For the first time, we attack the problem of semi-supervised node classification by transferring the knowledge learned from historical graphs. We propose a novel meta-learning algorithm on graphs instead of i.i.d. data. We learn a transferable metric space for node similarity, where two embedding functions encrypting both local and global structures are learned from previous graphs.
Conventional hyperparameter optimization (HPO) algorithms require considerable hyperparameter evaluation trials, which impedes their success in wider applications where a single trial on a huge dataset is often costly. Thereon, we are inspired to speed up HPO by transferring knowledge from historical HPO trials on other datasets. The proposed meta-learning algorithm innovates the dataset-aware attention to identify the most similar datasets, and first transfers trial observations, neural processes parameters, and initial hyperparameter configurations collectively from these datasets.
Jointly extracting aspects and sentiments for sentiment classification requires considerable labeled sentences which are highly labor-intensive. We innovatively alleviate the problem via unsupervised domain adaptation from a sufficiently labeled domain. We propose a novel selective adversarial learning method to learn correlation vectors between aspects and sentiments and attentively transfer them across domains.
From Whole Slide Imaging to Microscopy: Deep Microscopy Adaptation Network for
Histopathology Cancer Image Classification
Yifan Zhang,
Hanbo Chen,
Ying Wei,
Peilin Zhao,
Jiezhang Cao, Xinjuan Fan, Xiaoying Lou, Hailing Liu, Jinlong Hou, Xiao Han, Jianhua Yao, Qingyao Wu,
Mingkui Tan,
Junzhou Huang
22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019
pdf
This work is the first to empower digital pathology image classification directly based on microscopy images. Specifically, we resort to unsupervised domain adaptation from whole slide images to remedy the lack of annotated microscopy images. The proposed resolves intra-domain discrepancy and class imbalance via entropy minimization and sample re-weighting, respectively, besides inter-domain discrepancy.
We devote to conquering a critical challenge in meta-learning, namely task uncertainty and heterogeneity, where tasks may be originated from wildly different distributions. We propose a highly-motivated meta-learning algorithm with hierarchical task clustering. It not only alleviates task heterogeneity via knowledge customization to different clusters of tasks, but also preserves knowledge generalization among a cluster of similar tasks.
This work improves spatial-temporal prediction tasks like traffic prediction for those cities with only limited training data in a short period. The improvement is attributed to the knowledge transferred from other cities with sufficient data covering long periods. We first introduce the meta-learning paradigm into spatial-temporal prediction, and formulate the transferable knowledge as both short-term and long-term spatial-temporal patterns which are represented as model parameters and an explicit memory, respectively.
We aim at identifying sentiment towards aspect terms in a sentence, while annotating sentences in this case is prohibitively expensive. Innovatively, we leverage knowledge from more easily accessible sentences whose sentiment is annotated to aspect categories. We propose a multi-granularity alignment network to achieve domain adaptation, which resolves both aspect granularity inconsistency and feature discrepancy between domains.
This work is the pioneer in automatically identifying an effective multitask model for a multitask problem, empowered by a groundbreaking learning to multitask framework.
This work opens a new door to improve transfer learning effectiveness. We propose a groundbreaking learning to transfer framework to automatically optimize what and how to transfer across domains, by taking advantage of previous transfer learning experiences.
We are dedicated to improve cross-domain sentiment classification, from the perspectives of discovering domain-invariant emotion words of higher quality for knowledge transfer as well as capturing domain-specific emotion words for sentiment classification. The proposed hierarchical attention transfer network achieves the two goals with a hierarchical attention mechanism and a non-pivots network, respectively.
Though contextual bandit effectively solves the exploitation-exploration dilemma in recommendation systems, it suffers from over-exploration in the cold-start scenario. This work is the first to alleviate the problem by transferring knowledge from other domains. We propose a transferable contextual bandit policy which transfers observations to improve user interests estimation for exploitation and thus accelerates the exploration.transfer network achieves the two goals with a hierarchical attention mechanism and a non-pivots network, respectively.
Highly motivated by human beings' capabilities to reflect on transfer learning experiences, we propose a novel transfer learning framework to learn meta-knowledge from historical transfer learning experiences and apply the meta-knowledge to automatically optimize what to transfer in the future.
This work focuses on cross-domain sentiment classification, e.g., sentiment classification of book reviews by transferring knowledge from electronics product reviews. The key here is to identify domain-invariant emotion words as the transferable knowledge. We are the first to automatically learn domain-invariant emotion words by introducing an end-to-end adversarial memory network and offer a direct visualization of them.
We devote to address the problems of overfitting and high-variance gradients, when training deep neural networks on high dimension but low sample size data such as genetic data for phenotype prediction in bioinformatics. We propose a deep neural pursuit network which alleviates overfitting by selecting a subset of features and reduces variance by averaging the gradients over multiple dropouts.
This work provides a theoretical analysis and guarantee for the scalable heterogeneous translated hashing method which is proposed to build the correspondence between heterogeneous domains.
We propose the first principled approach to transfer knowledge between domains, each of which comprises multiple modalities of datasets. We conduct a case study of air quality prediction -- borrowing knowledge from the cities with sufficient annotations and data to the cities with either scarce annotations or insufficient data in any modality. The proposed method formulates the transferable knowledge as semantically related dictionaries for multiple modalities learned from a source domain and labeled examples.
This work first transfers knowledge from posts in the social media side to sensors in the physical world to improve ubiquitous computing tasks such as activity recognition. We propose a co-regularized heterogeneous transfer learning model to discover the transferable feature representations that bridge two domains in heterogeneous representation structures, co-regularized by both correspondence and labels.
Knowledge transfer between domains that lie in heterogeneous feature spaces but have no access to explicit correspondence is almost impossible. This work is the pioneer in using hashing to build the correspondence between such domains. The proposed method simultaneously learns hash functions embedding heterogeneous domains into different Hamming spaces, and a translator aligning these spaces.
The source code of this website is adapted from both this and this page.
|
|