|
Publications |
( ___: equal contribution, *: corresponding author)
Time-Varying LoRA: Towards Effective Cross-Domain Fine-Tuning of Diffusion Models
Zhan Zhuang, Yulong Zhang, Xuehao Wang, Jiangang Lu, Ying Wei*, Yu Zhang*
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
pdf / code coming soon
This paper introduces Terra, a novel Time-varying low-rank adapter that offers a fine-tuning framework for domain flow generation. Terra constructs a continuous parameter manifold via a time variable, with its expressive power theoretically analyzed. This domain flow generation framework flexibly supports both unsupervised domain adaptation and domain generalization, achieving state-of-the-art performance by generating interpolated domains with varying styles to bridge the gap between source and target domains.
Learning Where to Edit Vision Transformers
Yunqiao Yang, Long-Kai Huang, Shengzhuang Chen, Kede Ma, Ying Wei*
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
pdf / code coming soon
This work addresses the gap in model editing for vision models by (1) curating two benchmarks that existing pre-trained ViTs struggle to predict correctly, and (2) correcting the predictive errors of ViTs, particularly those arising from subpopulation shifts. We propose a learning-to-learn approach that identifies a small set of critical parameters for editing in response to erroneous samples, with the locations of these parameters determined by a hypernetwork. By simulating the edit process and explicitly optimizing for edit success, the hypernetwork is trained to output reliable and generalizable editing locations. Additionally, the sparsity constraint imposed on the hypernetwork ensures that edits are localized, without distorting irrelevant parameters.
DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs
Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun*, Ying Wei*
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
pdf / code ( oral)
This work tackles Massive Outliers (outlier activations) in LLMs that lead to significant performance degradation in low-bit quantization. We introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate massive outliers besides normal ones. DuQuant outperforms the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization.
Mixture of Adversarial LoRAs: Boosting Robust Generalization in Meta-tuning
Xu Yang, Chen Liu, Ying Wei*
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
pdf / code coming soon
This work introduces AMT, an Adversarial Meta-Tuning methodology designed to enhance the robust generalization of pre-trained models for out-of-domain (OOD) few-shot learning. The core innovation of AMT is a robust LoRAPool, which consists of LoRAs meta-tuned with dual perturbations on both inputs and singular values/vectors across varying robustness levels. Extensive evaluations demonstrate that AMT significantly outperforms previous state-of-the-art methods across a range of OOD few-shot image classification tasks.
Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing
Weichuan Wang, Zhaoyi Li, Defu Lian, Chen Ma, Linqi Song*, Ying Wei*
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
pdf / code coming soon
This work targets two major translation errors encountered by current LLMs — language mismatch and repetition — through model editing methods. We find that direct application of localization-based edits either yields limited impact or negatively affects general translation quality. To address this, we refine the identified components by intersecting localization results across languages, filtering out irrelevant information. Experiments show our approach effectively reduces these errors while preserving or improving overall translation quality.
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian*, Ying Wei*
Sixty-second Annual Meeting of the Association for Computational Linguistics (ACL) Findings, 2024
pdf / code
This paper is among the first to reveal that in LLMs implicit reasoning results indeed surface within middle layers and play a causative role in shaping the final explicit reasoning results. The findings support us to develop CREME, a lightweight method to patch errors in compositional reasoning via editing the located MHSA modules. Our empirical evidence stands testament to CREME’s effectiveness, paving the way for autonomously and continuously enhancing compositional reasoning capabilities in LLMs.
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation
Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying Wei, Defu Lian, Zhendong Mao*.
Sixty-second Annual Meeting of the Association for Computational Linguistics (ACL), 2024
pdf / code
This paper proposes CompMCTG which serves as a benchmark encompassing diverse multi-aspect labeled datasets and a crafted three-dimensional evaluation protocol to holistically evaluate the compositional generalization of multi-aspect controllable text generation (MCTG) approaches, as well as Meta-MCTG that is a meta-learning inspired framework to mitigate the noticeable performance drop of existing MCTG approaches in compositional generalization.
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz,
Ying Wei*
Forty-first International Conference on Machine Learning (ICML), 2024
pdf / code
This paper addresses the so far limited success of meta-tuning on especially out-of-domain (OOD) tasks, where meta-tuning is a subsequent optimization stage for foundation models that attempts to harness the best of both parameter-efficient fine-tuning and meta-learning. The proposed approach Sparse MetA-Tuning (SMAT), trained to automatically isolate subsets of pre-trained parameters for meta-tuning on each task, successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient fine-tuning.
One Meta-tuned Transformer is What You Need for Few-shot Learning
Xu Yang, Huaxiu Yao, Ying Wei*
Forty-first International Conference on Machine Learning (ICML), 2024
pdf / code ( spotlight)
This paper introduces MetaFormer, a new meta-tuning framework exclusively based on attention. MetaFormer enhances the few-shot learning capacity of vision transformers by integrating both sample and task relationships into the model, which includes Masked Sample Attention for embedding sample relationships and Patch-grained Task Attention for encapsulating task relationships. MetaFormer demonstrates coherence and compatibility with off-the-shelf pre-trained vision transformers and shows significant improvements in both inductive and transductive few-shot learning scenarios.
Mitigating Catastrophic Forgetting in Online Continual Learning by Modeling Previous Task Interrelations
Yichen Wu, Hong Wang, Peilin Zhao, Yefeng Zheng, Ying Wei*, Long-Kai Huang*
Forty-first International Conference on Machine Learning (ICML), 2024
pdf / code
This work reformulates replay-based continual learning methods as a unified framework, upon which we design a Pareto-Optimized CL algorithm (POCL) that leverages Pareto optimization to capture the interrelationship among previously learned tasks. POCL thus effectively enhances the overall performance of past tasks while ensuring the performance of the current task, further alleviating catastrophic forgetting.
Federated Continual Learning via Prompt-based Dual Knowledge Transfer
Hongming Piao, Yichen Wu, Dapeng Wu, Ying Wei*
Forty-first International Conference on Machine Learning (ICML), 2024
pdf / code
This paper introduces the Prompt-based Knowledge Transfer FCL (PKT-FCL) algorithm that prompts positive knowledge transfer across tasks and clients, which has been overlooked before in federated continual learning. PKT-FCL not only reduces communication costs but also addresses privacy concerns through a novel approach for prompt generation and aggregation, showing superior performance in comprehensive experimental evaluations.
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song,
Ying Wei*, Zhenan Sun*
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024
pdf / code
This paper addresses the challenge of deploying large vision-language pre-trained models on platforms with limited computational resources by introducing a new metric, Module-wise Pruning Error (MoPE), which quantifies the impact of module removal on cross-modal task performance. Utilizing the MoPE metric, we propose a unified pruning framework that applies to both pre-training and fine-tuning stages, effectively compressing vision-language models while preserving their performance capabilities.
Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction
Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng,
Ying Wei*
Twelfth International Conference on Learning Representations (ICLR), 2024 ( Outstanding Honorable Mention / oral)
pdf / code
This study revisits Meta-Continual Learning (Meta-CL) and for the first time bridge Meta-CL with regularization-based methods. Concretely, Meta-CL implicitly approximates Hessian in an online manner, which enjoys the benefits of timely adaptation but meantime suffers from high variance induced by random memory buffer sampling. We are thus highly motivated to combine the best of both worlds, through the proposal of Variance Reduced Meta-CL (VR-MCL) to achieve both timely and accurate Hessian approximation.
Gradual Domain Adaptation via Gradient Flow
Zhan Zhuang, Yu Zhang*, Ying Wei*
Twelfth International Conference on Learning Representations (ICLR), 2024 ( spotlight)
pdf / code
To address the challenge of ineffective intermediate domains for gradual domain adaptation (GDA), this work explores gradient flow to generate intermediate domains with preserving labels, thereby enabling us a fine-tuning method for GDA. We employ the Wasserstein gradient flow in Kullback–Leibler divergence to transport samples from the source to the target domain. To simulate the dynamics, we utilize the Langevin algorithm. Since the Langevin algorithm disregards label information and introduces diffusion noise, we introduce classifier-based and sample-based potentials to avoid label switching and dramatic deviations in the sampling process.
Active Retrosynthetic Planning Aware of Route Quality
Luotian Yuan, Yemin Yu, Ying Wei*, Yongwei Wang, Zhihua Wang, Fei Wu*
Twelfth International Conference on Learning Representations (ICLR), 2024
pdf / code
This study addresses the long-standing challenge of route quality evaluation in retrosynthetic planning, through an Active Retrosynthetic Planning (ARP) framework that involves a minimum annotation from chemists. The proposed ARP remains compatible with established retrosynthetic planners, which trains an actor that decides whether to query the quality of a reaction and resorts to a critic to estimate the value of a molecule with its preceding reaction quality as input. On both the benchmark and an expert dataset, ARP outperforms the existing state-of-the-art approach by 6.2% in route quality while reducing the query cost by 12.8%.
RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction
Yemin Yu, Luotian Yuan, Ying Wei*, Hanyu Gao, Xinhai Ye, Zhihua Wang, Fei Wu
Thirty-eighth Annual AAAI Conference on Artificial Intelligence (AAAI), 2024
pdf / code
Despite steady progress of existing retrosynthesis methods on standard benchmarks, our understanding of them under the premise of distribution shifts remains stagnant. This study fills in the gap by (1) formally sorting out two types of distribution shifts in retrosynthesis prediction, (2) constructing two groups of benchmark datasets, (3) conducting comprehensive experiments to reveal the limitations of previous in-distribution evaluation and state-of-the-art methods. and (4) proposing two model-agnostic techniques that can improve the OOD generalization of arbitrary off-the-shelf retrosynthesis prediction algorithms.
Towards Anytime Fine-tuning: Continually Pre-trained Language Models with Hypernetwork Prompts
Gangwei Jiang, Caigao Jiang, Siqiao Xue, James Y. Zhang, Jun Zhou, Defu Lian*,
Ying Wei*
2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
pdf
This study first investigates "anytime fine-tuning" effectiveness of existing continual learning approaches, concluding with unanimously decreased performance on unseen domains. To this end, we propose a prompt-guided continual pre-training method, where we train a hypernetwork to generate domain-specific prompts by both agreement and disagreement losses. Our method achieves improvements of 3.57% and 3.4% on two real-world datasets (including domain shift and temporal shift), respectively.
In this work, we propose a single coherent framework named Energy-Based Meta-Learning (EBML) that supports both detection and adaptation of OOD tasks, while remaining compatible with off-the-shelf meta-learning backbones. EBML learns to characterize any arbitrary meta-training task distribution with the composition of two expressive neural-network-based energy functions. We deploy the sum of the two energy functions, being proportional to the joint distribution of a task, as a reliable score for detecting OOD tasks; during meta-testing, we adapt the OOD task to in-distribution tasks by energy minimization.
Does Continual Learning Meet Compositionality? New Benchmarks and An Evaluation Framework
Weiduo Liao,
Ying Wei*,
Mingchen Jiang,
Qingfu Zhang*,
Hisao Ishibuchi*
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
Track on Datasets and Benchmarks
pdf / code
We present two vision benchmarks, namely Compositional GQA (CGQA) and Compositional OBJects365 (COBJ), along with a novel evaluation framework called Compositional Few-Shot Testing (CFST). Comprehensive empirical results on systematicity, productivity, and substitutivity aspects of compositional generalization demonstrate that current continual learning techniques do exhibit somewhat favorable compositionality in their learned feature extractors, while future research on modularity is urgently needed.
Concept-wise Fine-tuning Matters in Preventing Negative Transfer
Yunqiao Yang,
Long-Kai Huang,
Ying Wei*
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
pdf / code
We propose a Concept-wise fine-Tuning (Concept-Tuning) approach which refines feature representations in the level of patches with each patch encoding a concept. Concept-Tuning minimizes the negative impacts of rare features and spuriously correlated features in a pre-trained model by (1) maximizing the mutual information between examples in the same category with regard to a slice of rare features (a patch) and (2) applying front-door adjustment via attention neural networks in channels and feature slices (patches).
Learning to Substitute Spans towards Improving Compositional Generalization
Zhaoyi Li,
Ying Wei*,
Defu Lian*
Sixty-first Annual Meeting of the Association for Computational Linguistics (ACL), 2023 ( oral)
pdf / code
This work introduces a compositional data augmentation approach that incurs additional compositional inductive biasto pre-trained models. We first propose a novel compositional augmentation strategy dubbed Span Substitution (SpanSub) that enables multi-grained composition of substantial substructures in the whole training set. Over and above that, we introduce the Learning to Substitute Span (L2S2) framework which empowers the learning of span substitution probabilities in SpanSub in an end-to-end manner by maximizing the loss of neural sequence models, so as to outweigh those challenging compositions with elusive concepts and novel surroundings
Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
Weixia Zhang,
Guangtao Zhai,
Ying Wei,
Xiaokang Yang,
Kede Ma
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023
pdf
We develop a general and automated multitask learning scheme for blind image quality assessment to exploit auxiliary knowledge from other tasks, in a way that the model parameter sharing and the loss weighting are determined automatically. Specifically, we first describe all candidate label combinations (from multiple tasks) using a textual template, and compute the joint probability from the cosine similarities of the visual-textual embeddings in CLIP. Predictions of each task can be inferred from the joint distribution, and optimized by carefully designed loss functions.
Learning Chemical Rules of Retrosynthesis with Pre-training
Yinjie Jiang,
Ying Wei*,
Fei Wu*,
Zhengxing Huang,
Kun Kuang,
Zhihua Wang
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023
pdf
Towards the very burgeoning research area of AI-aided retrosythesis, we propose a pre-training solution to address the pronounced remaining issue regarding template-free methods, i.e., failing to conform to chemical rules. Concretely, we enforce the atom conservation rule via a molecule reconstruction pre-training task, and the reaction rule that dictates reaction centers via a reaction type guided contrastive pre-training task. Our empirical results show that the pre-training solution significantly boosts the single-step retrosynthesis accuracies.
Adversarial Task Up-sampling for Meta-learning
Yichen Wu,
Long-Kai Huang*,
Ying Wei*
36th Conference on Neural Information Processing Systems (NeurIPS), 2022 ( spotlight)
pdf / code
This work named Adversarial Task Up-sampling (ATU) pushes ahead augmentation of sufficiently imaginary meta-training tasks with task-correctness guarantee, where we seek an approach that up-samples meta-training tasks from the task manifold via a task up-sampling network. ATU also suffices to generate tasks that can maximally contribute to the latest meta-learner by maximizing an adversarial loss.
Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization
Long-Kai Huang,
Ying Wei*
36th Conference on Neural Information Processing Systems (NeurIPS), 2022 ( spotlight)
pdf
This work focuses on improving task-specific generalization in the meta-testing stage, where we derive the vicinal loss function that approximates the true task distribution with aggregation of per-sample Gaussian-like vicinal distributions. We estimate the statistical parameters of the vicinal distribution for each training sample by 1) initiating a random walk from the sample and 2) computing the weighted mean and variance of those unlabeled data passed by the walk. The proposed method outperforms state-of-the-art few-shot learning baselines in four benchmarks.
Retrosynthetic planning occupies a crucial position in synthetic chemistry and, accordingly, drug discovery, which aims to find synthetic pathways of a target molecule through a sequential decision-making process on a set of feasible reactions. This work named Goal-dRiven Actor-critic retroSynthetic Planning (GRASP) framework first (1) formulates the retrosynthetic planning into a reinforcement learning framework which enjoys more efficient and accurate value estimation of a molecule, and (2) achieves goal-driven retrosynthesis navigation toward a user-demand objective.
We propose a simple (10 lines of codes), efficient (through a single pass over examples of a target task), yet effective (on 26 pre-trained models and 16 downstream tasks) transferability measure named TransRate for fine-tuning pre-trained models. TransRate measures the mutual information between features of target examples by a pre-trained model and labels of them, which we estimate using coding rate.
The Role of Deconfounding in Meta-learning
Yinjie Jiang,
Zhengyu Chen,
Kun Kuang*, Luotian Yuan, Xinhai Ye, Zhihua Wang,
Fei Wu*,
Ying Wei*
39th International Conference on Machine Learning (ICML), 2022
pdf
This work offers a novel causal perspective of meta-learning, through which we explain the memorization effect as a confounder and frame previous anti-memorization methods as different deconfounder approaches. Derived from the causal inference principle of front-door adjustment, we propose two frustratingly easy but effective deconfounder algorithms.
Artificial Intelligence for Retrosynthesis Prediction
Yinjie Jiang,
Yemin Yu,
Ming Kong, Yu Mei, Luotian Yuan, Zhengxing Huang,
Kun Kuang, Zhihua Wang,
Huaxiu Yao,
James Zou,
Connor W. Coley,
Ying Wei*
Engineering, 2022
pdf
In recent years, there has been a dramatic rise in interest in retrosynthesis prediction with AI techniques. This survey describes the current landscape of AI-driven retrosynthesis prediction, including (1) formal definitions of the retrosynthesis problem, (2) the outstanding research challenges therein, (3) related AI techniques and recent progress that enable retrosynthesis prediction, (4) a novel landscape that provides a comprehensive categorization of different retrosynthesis prediction components, (5) how AI reshapes each component, and (6) promising areas for future research.
Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering
Juan Zha,
Zheng Li,
Ying Wei,
Yu Zhang
2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
pdf
This work named self-supervised hierarchical task cluster (SS-HTC) improves few-shot text classification. SS-HTC customizes cluster-specific knowledge by dynamically organizing heterogeneous tasks into different clusters in hierarchical levels and also disentangles underlying relations between tasks to improve the interpretability. Extensive experiments on five public FSTC benchmark datasets demonstrate the effectiveness of SS-HTC.
Self-supervised Text Erasing with Controllable Image Synthesis
Gangwei Jiang,
Shiyao Wang,
Tiezheng Ge,
Yuning Jiang,
Ying Wei,
Defu Lian
30th ACM International Conference on Multimedia (MM), 2022
pdf
This work studies a novel self-supervised text erasing framework to alleviate the heavy reliance on costly annotations. Specifically, we propose a style-aware image synthesis function that generates synthetic images with diverse style texts and a policy network that controls the synthetic mechanisms to bridge the text style gap between synthetic and real-world data. We have also constructed a new dataset called PosterErase.
This work pursues an adaptive task scheduler for meta-learning tasks that are likely detrimental with noise or imbalanced given a limited number of meta-training tasks. We for the first time design a neural scheduler to decide which meta-training tasks to use next and train the scheduler to optimize the generalization capacity of the meta-knowledge to unseen tasks. We have shown that such a scheduler theoretically improves the optimization landscape and empirically outshines conventional schedulers (including the commonly adopted random sampling).
This paper seeks to remedy the lack of labeled compounds with activities (ADMET properties) in virtual screening (lead optimization) of drug by transferring the knowledge from previous assays, namely in-vivo experiments, collected by different laboratories and against various target proteins. We propose a functionally regionalized meta-learning algorithm, with the architectural compositional capability, to accommodate wildly different assays and meantime capture the relationship between assays.
MetaTS: Meta Teacher-Student Network for Multilingual SequenceLabeling with Minimal Supervision
Zheng Li,
Danqing Zhang,
Tianyu Cao,
Ying Wei,
Yiwei Song,
Bing Yin
2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
pdf
We explore multilingual sequence labeling with a single unified model for multiple languages and minimal supervision. Specifically, we resort to the teacher-student framework to leverage large multilingual unlabeled data. We propose a meta teacher-student (MetaTS) network that allows the teacher to dynamically adapt its pseudo-annotation strategies by the student's feedback on the generated pseudo-labeled data of each language.
Meta-learning Hyperparameter Performance Prediction with Neural Processes
Ying Wei,
Peilin Zhao,
Junzhou Huang
38th International Conference on Machine Learning (ICML), 2021
pdf /
code
We transfer knowledge from historical hyperparameter optimization (HPO) trials on other datasets to speed up HPO of a huge dataset where even a single trial is costly. The proposed meta-learning algorithm first introduces neural processes (NPs) as a surrogate model which empowers the simultaneous transfer of trial observations, parameters of NPs, and initial hyperparameter configurations.
This work addresses the meta-overfitting problem. We solve the problem by augmenting as many tasks as possible. Concretely, we propose the two criteria for valid task augmentation and also two task augmentation methods that satisfy the criteria. Theoretical studies and empirical results both demonstrate that the proposed task augmentation strategies significantly mitigate the meta-overfitting. Also, the task augmentation strategies remain compatible with any advanced meta-learning algorithms.
Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages
Zheng Li, Mukul Kumar, William Headden, Bing Yin, Ying Wei,
Yu Zhang,
Qiang Yang
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
pdf
This work focuses on the problem of cross-lingual transfer (CLT). For each CLT task, we formulate the transfer process as information propagation over a dynamic graph. More importantly, we improve the transfer effectiveness by extracting meta-knowledge such as propagation strategies from previous CLT experiences.
Self-Supervised Graph Transformer on Large-Scale Molecular Data
Yu Rong,
Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei,
Wenbing Huang,
Junzhou Huang
34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020
pdf
We propose a novel framework, GROVER, for effective molecular representation which is a crucial prerequisite in AI-driven drug design and discovery. GROVER learns to characterize molecules with rich and semantic features from enormous unlabeled molecular data, with carefully designed self-supervised tasks in node-, edge-, and graph-level. Besides, GROVER itself is more expressive, integrating Message Passing Networks into the Transformer-style architecture.
Adversarial Sparse Transformer for Time Series Forecasting
Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao,
Ying Wei, Junzhou Huang
34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020
pdf
Existing time series forecasting methods fail to either capture stochasticity of data or forecast for a long time horizon due to error accumulation. In this work, we are motivated to address the two issues with a novel time series forecasting model. The model, Adversarial Sparse Transformer (AST), based on GAN, adopts a sparse Transformer as the generator to learn a sparse attention map for forecasting and meanwhile takes a discriminator to improve the prediction performance at a sequence level.
We are strongly motivated to improve unsupervised domain adaptation for medical image diagnosis, from the perspectives of denoising noisy annotations due to limited expertise and differentiating the adaptation difficulty of images that have significant discrepancies. The proposed, harnessing the collective intelligence of two peer networks, achieves the goals via a noise co-adaptation layer and a transferability-aware weight for each image.
TranSlider: Transfer Ensemble Learning from Exploitation to Exploration
Kuo Zhong,
Ying Wei*,
Chun Yuan,
Haoli Bai,
Junzhou Huang
Twenty-sixth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020
pdf
Learning a strategy dictating what and where to transfer is key to avoid negative transfer, while the strategy always suffers from overfitting in light of limited annotations in a target domain. For the first time, we propose transfer ensemble learning to solve the problem. We propose to generate a spectrum of models in decreasing transferability, ranging from pure exploitation of the source model to unconstrained exploration for the target domain.
For the first time, we attack the problem of semi-supervised node classification by transferring the knowledge learned from historical graphs. We propose a novel meta-learning algorithm on graphs instead of i.i.d. data. We learn a transferable metric space for node similarity, where two embedding functions encrypting both local and global structures are learned from previous graphs.
Conventional hyperparameter optimization (HPO) algorithms require considerable hyperparameter evaluation trials, which impedes their success in wider applications where a single trial on a huge dataset is often costly. Thereon, we are inspired to speed up HPO by transferring knowledge from historical HPO trials on other datasets. The proposed meta-learning algorithm innovates the dataset-aware attention to identify the most similar datasets, and first transfers trial observations, neural processes parameters, and initial hyperparameter configurations collectively from these datasets.
Jointly extracting aspects and sentiments for sentiment classification requires considerable labeled sentences which are highly labor-intensive. We innovatively alleviate the problem via unsupervised domain adaptation from a sufficiently labeled domain. We propose a novel selective adversarial learning method to learn correlation vectors between aspects and sentiments and attentively transfer them across domains.
From Whole Slide Imaging to Microscopy: Deep Microscopy Adaptation Network for
Histopathology Cancer Image Classification
Yifan Zhang,
Hanbo Chen,
Ying Wei,
Peilin Zhao,
Jiezhang Cao, Xinjuan Fan, Xiaoying Lou, Hailing Liu, Jinlong Hou, Xiao Han, Jianhua Yao, Qingyao Wu,
Mingkui Tan,
Junzhou Huang
22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019
pdf
This work is the first to empower digital pathology image classification directly based on microscopy images. Specifically, we resort to unsupervised domain adaptation from whole slide images to remedy the lack of annotated microscopy images. The proposed resolves intra-domain discrepancy and class imbalance via entropy minimization and sample re-weighting, respectively, besides inter-domain discrepancy.
We devote to conquering a critical challenge in meta-learning, namely task uncertainty and heterogeneity, where tasks may be originated from wildly different distributions. We propose a highly-motivated meta-learning algorithm with hierarchical task clustering. It not only alleviates task heterogeneity via knowledge customization to different clusters of tasks, but also preserves knowledge generalization among a cluster of similar tasks.
This work improves spatial-temporal prediction tasks like traffic prediction for those cities with only limited training data in a short period. The improvement is attributed to the knowledge transferred from other cities with sufficient data covering long periods. We first introduce the meta-learning paradigm into spatial-temporal prediction, and formulate the transferable knowledge as both short-term and long-term spatial-temporal patterns which are represented as model parameters and an explicit memory, respectively.
We aim at identifying sentiment towards aspect terms in a sentence, while annotating sentences in this case is prohibitively expensive. Innovatively, we leverage knowledge from more easily accessible sentences whose sentiment is annotated to aspect categories. We propose a multi-granularity alignment network to achieve domain adaptation, which resolves both aspect granularity inconsistency and feature discrepancy between domains.
This work is the pioneer in automatically identifying an effective multitask model for a multitask problem, empowered by a groundbreaking learning to multitask framework.
This work opens a new door to improve transfer learning effectiveness. We propose a groundbreaking learning to transfer framework to automatically optimize what and how to transfer across domains, by taking advantage of previous transfer learning experiences.
We are dedicated to improve cross-domain sentiment classification, from the perspectives of discovering domain-invariant emotion words of higher quality for knowledge transfer as well as capturing domain-specific emotion words for sentiment classification. The proposed hierarchical attention transfer network achieves the two goals with a hierarchical attention mechanism and a non-pivots network, respectively.
Though contextual bandit effectively solves the exploitation-exploration dilemma in recommendation systems, it suffers from over-exploration in the cold-start scenario. This work is the first to alleviate the problem by transferring knowledge from other domains. We propose a transferable contextual bandit policy which transfers observations to improve user interests estimation for exploitation and thus accelerates the exploration.transfer network achieves the two goals with a hierarchical attention mechanism and a non-pivots network, respectively.
Highly motivated by human beings' capabilities to reflect on transfer learning experiences, we propose a novel transfer learning framework to learn meta-knowledge from historical transfer learning experiences and apply the meta-knowledge to automatically optimize what to transfer in the future.
This work focuses on cross-domain sentiment classification, e.g., sentiment classification of book reviews by transferring knowledge from electronics product reviews. The key here is to identify domain-invariant emotion words as the transferable knowledge. We are the first to automatically learn domain-invariant emotion words by introducing an end-to-end adversarial memory network and offer a direct visualization of them.
We devote to address the problems of overfitting and high-variance gradients, when training deep neural networks on high dimension but low sample size data such as genetic data for phenotype prediction in bioinformatics. We propose a deep neural pursuit network which alleviates overfitting by selecting a subset of features and reduces variance by averaging the gradients over multiple dropouts.
This work provides a theoretical analysis and guarantee for the scalable heterogeneous translated hashing method which is proposed to build the correspondence between heterogeneous domains.
We propose the first principled approach to transfer knowledge between domains, each of which comprises multiple modalities of datasets. We conduct a case study of air quality prediction -- borrowing knowledge from the cities with sufficient annotations and data to the cities with either scarce annotations or insufficient data in any modality. The proposed method formulates the transferable knowledge as semantically related dictionaries for multiple modalities learned from a source domain and labeled examples.
This work first transfers knowledge from posts in the social media side to sensors in the physical world to improve ubiquitous computing tasks such as activity recognition. We propose a co-regularized heterogeneous transfer learning model to discover the transferable feature representations that bridge two domains in heterogeneous representation structures, co-regularized by both correspondence and labels.
Knowledge transfer between domains that lie in heterogeneous feature spaces but have no access to explicit correspondence is almost impossible. This work is the pioneer in using hashing to build the correspondence between such domains. The proposed method simultaneously learns hash functions embedding heterogeneous domains into different Hamming spaces, and a translator aligning these spaces.
The source code of this website is adapted from both this and this page.
|
|