open access publication

Article, 2024

Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels

Complex & Intelligent Systems, ISSN 2199-4536, 2198-6053, Volume 10, 3, Pages 4033-4054, 10.1007/s40747-024-01360-z

Contributors

Deng, Lihui [1] Yang, Bin 0000-0003-0805-7928 (Corresponding author) [1] Kang, Zhongfeng 0000-0001-9025-0748 [2] Wu, Jiajin 0009-0007-1069-9387 [1] Li, Shaosong [1] Xiang, Yanping 0000-0001-9622-7661 [1]

Affiliations

  1. [1] University of Electronic Science and Technology of China
  2. [NORA names: China; Asia, East];
  3. [2] University of Copenhagen
  4. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD]

Abstract

Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy . We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.

Keywords

ANIMAL-10N, CIFAR-10, CIFAR-100, Clothing1M, LNL, Learning with Noisy Labels, RLA, accuracy, accuracy of deep neural networks, clean samples, code, deep neural network accuracy, deep neural networks, difficulties, effect, empirical studies, experimental results, extract more useful information, feature patterns, hard samples, influence, information, insufficiency, labeling, learning, learning difficulties, learning risk, memorization, method, misclassification, network, neural network, noisy labels, noisy samples, patterns, process, results, risk, samples, sets, similarity, study, training, training set, useful information, years

Funders

  • National Natural Science Foundation of China

Data Provider: Digital Science