Skip to main content

Drug-target interaction prediction with collaborative contrastive learning and adaptive self-paced sampling strategy

Abstract

Background

Drug-target interaction (DTI) prediction plays a pivotal role in drug discovery and drug repositioning, enabling the identification of potential drug candidates. However, most previous approaches often do not fully utilize the complementary relationships among multiple biological networks, which limits their ability to learn more consistent representations. Additionally, the selection strategy of negative samples significantly affects the performance of contrastive learning methods.

Results

In this study, we propose CCL-ASPS, a novel deep learning model that incorporates Collaborative Contrastive Learning (CCL) and Adaptive Self-Paced Sampling strategy (ASPS) for drug-target interaction prediction. CCL-ASPS leverages multiple networks to learn the fused embeddings of drugs and targets, ensuring their consistent representations from individual networks. Furthermore, ASPS dynamically selects more informative negative sample pairs for contrastive learning. Experiment results on the established dataset demonstrate that CCL-ASPS achieves significant improvements compared to current state-of-the-art methods. Moreover, ablation experiments confirm the contributions of the proposed CCL and ASPS strategies.

Conclusions

By integrating Collaborative Contrastive Learning and Adaptive Self-Paced Sampling, the proposed CCL-ASPS effectively addresses the limitations of previous methods. This study demonstrates that CCL-ASPS achieves notable improvements in DTI predictive performance compared to current state-of-the-art approaches. The case study and cold start experiments further illustrate the capability of CCL-ASPS to effectively predict previously unknown DTI, potentially facilitating the identification of new drug-target interactions.

Background

The discovery of novel drug-target interactions is at the heart of pharmaceutical research, driving the development of innovative therapies and the realization of personalized medicine [1]. Traditional wet-lab methods are expensive and time-consuming, whereas computational-based methods significantly improve drug candidate identification efficiency [2]. Consequently, computational-based methods have received increasing research interest. A key challenge lies in leveraging the complementary relationships encoded within multiple accessible interaction and association networks related to drugs and targets (proteins).

The embedding learning of drugs and targets is crucial for DTI prediction. Current methods typically learn representations from each network independently, followed by fusion via concatenation [3,4,5,6], average with attention mechanisms [7, 8], as displayed in Fig. 1A. Although these approaches utilize multiple networks, they learn embeddings separately, neglecting the complementary relationships among multiple networks. This hinders the learning of consistent drug and target representations. Recent work has demonstrated the effectiveness of contrastive learning (CL) in learning consistent representations across different networks [9]. However, traditional contrastive learning approaches are primarily designed for two networks (Fig. 1B), limiting their applicability in scenarios involving multiple networks. This study addresses the challenge by leveraging contrastive learning to collaboratively obtain embeddings across multiple networks, the brief illustration is presented in Fig. 1C. The proposed model enables the learning of fused embeddings that are consistent with their individual network-derived representations. Meanwhile, traditional contrastive learning methods typically treat identical nodes across views as positive samples, while other node pairs as negative samples [10]. This strategy fails to fully exploit the structural information within individual networks and the complementary relationships between multiple networks. To overcome this drawback, we propose the adaptive self-paced sampling strategy.

Fig. 1
figure 1

A Representation learning by the traditional concatenation operation. B Representation learning by the typical contrastive learning strategy. C Representation learning by our proposed collaborative contrastive learning

In this study, we present Collaborative Contrastive Learning with Adaptive Self-Paced Sampling strategy to predict drug-target interactions, as displayed in Fig. 2A. Initially, CCL-ASPS learns the embeddings of drugs and targets from their corresponding 2D graph structures. Subsequently, these learned embeddings are leveraged as inputs for the collaborative contrastive learning module. Furthermore, CCL-ASPS incorporates the adaptive self-paced sampling strategy to select more informative negative samples for contrastive learning. Finally, a multilayer perceptron (MLP) decoder is employed to predict potential DTIs. Extensive experiment results demonstrate that CCL-ASPS achieves optimal performance compared to established baselines.

Fig. 2
figure 2

The workflow of CCL-ASPS. A Overall framework. First, CCL-ASPS applies graph neural networks to extract drug and target representations from molecule and amino acid residue graphs, respectively. Afterward, these representations and similarity networks are fed into the core CCL-ASPS module. Finally, the learned representations from the CCL-ASPS module are utilized by an MLP decoder to predict potential drug-target interactions. B Collaborative Contrastive Learning. This module employs multi-layer GATs to learn drug and target embeddings from each similarity network. The learned embeddings are then aggregated to generate fused representations. To ensure consistency of the fused representations, the collaborative contrastive loss function is applied to guide the training process. C Adaptive Self-Paced Sampling. This strategy dynamically selects more informative negative samples for contrastive learning. Initially, drug (target) feature similarities are measured based on the fused representations. In the subsequent step, reliability scores are calculated for each negative sample pair based on individual network similarity (ns) and fused representation similarity (fs). Ultimately, samples with high reliabilities are selected in a self-paced manner

In summary, the main contributions of CCL-ASPS are as follows:

  • CCL-ASPS applies the collaborative contrastive learning strategy to learn more consistent representations of drugs and targets. To the best of our knowledge, this is the first attempt to learn consistent features from multiple networks collaboratively.

  • CCL-ASPS employs the adaptive self-paced sampling strategy to select more informative negative samples for contrastive learning.

  • Extensive experiment results demonstrate that CCL-ASPS achieves state-of-the-art performance compared to established baselines.

Related work

DTI prediction

Computational-based approaches for predicting drug-target(protein) interactions can be broadly classified into three categories based on the utilized data type: structure type-based approaches, network type-based approaches and hybrid approaches. Structure type-based approaches typically extract drug representations from SMILES strings and protein representations from amino acid sequences [11,12,13,14]. For instance, DrugBAN [13] leverages a structure-based representation learning scheme, capturing individual atom and amino acid features to derive joint drug-protein representations. Ru [15] first extracts self-associated features (SAFs) of drugs and targets from similarity and sharing networks, then obtains adjacent-associated features (AAFs) based on their neighbors. Modality-DTA utilizes six modalities of drug SMILES and protein sequences to capture informative drug and protein representations [16]. Network type-based approaches focus on learning representations by constructing similarity networks from interaction and association data related to drugs and proteins [3, 7, 17]. Luo proposes DTINet [17], which constructs multiple drug and protein similarity networks based on association and interaction networks. This method then utilizes a compact representation learning algorithm to extract low-dimensional embeddings. Hybrid approaches aim to combine the strengths of both structure and network information. Some studies learn representations from structure and network data in parallel, followed by integration operations to combine the learned features [4, 18, 19]. Alternatively, sequential learning strategies involve an initial step of extracting representations from structure data, followed by feeding them into a network data-based encoder for further refinement [20,21,22]. Although structure type-based approaches and network type-based approaches have made significant advancements, structure type-based approaches often overlook the extensive network data present in biological information, while network type-based approaches fail to fully exploit the detailed biological structure information. These limitations can restrict their predictive capabilities. While hybrid methods leverage both types of data, existing approaches often fail to fully consider the complementary relationships across multiple networks.

Contrastive learning view generation

Inspired by the successful application in computer vision [23,24,25], contrastive learning has attracted widespread interest in graph-level representation learning and biological entity interaction prediction [9, 10, 26]. Typically, contrastive learning constructs two augmented views and aims to bring positive sample representations closer while pushing negative sample representations further away. Many studies employ techs such as node dropping, edge perturbation, attribute masking, or subgraph on the singular data source to create CL views [27,28,29]. Several approaches attempt to obtain CL views directly from structure or interaction networks. For instance, SMGCL [9] utilizes drug structure to construct a drug similarity network as one CL view and applies the drug-disease interaction network as another. MCHNLDA [30] builds one representation structure graph and one IncRNA-gene-disease interaction network as two views. MOVE [31] takes sequence information as one view and then treats multiple interaction networks as another view. However, these methods either rely on one biological data source or combine multiple networks into a single CL view. As a result, they fail to fully exploit the complementary relationships across multiple networks.

Contrastive learning sampling

Traditional contrastive learning approaches typically rely on a simple strategy where identical nodes across the two augmented views are considered positive pairs, while all other node pairs are treated as negative samples [10, 26, 31,32,33]. However, this strategy overlooks potentially informative similar node pairs, leading to suboptimal performance. To address this limitation, several studies have proposed more sophisticated sampling strategies. For instance, HeCo [34] constructs meta-paths through interaction networks and selects the top-k nodes with the highest number of meta-paths as positive samples. SGCL-DTI [35] constructs a semantic network based on node class information and selects first-order neighbors in this network as positive samples. SMGCL [9] calculates scores between nodes based on their learned representations and selects the top-k nodes with the highest scores as positive samples. Recent advancements incorporate curriculum learning into the sampling process. Curriculum learning imitates human learning by progressively introducing the model to more complex samples [36]. For instance, CuCo [29] introduces a score function that ranks negative samples from easiest to hardest, and a pace function that gradually increases the number of negative samples presented to the model during training.

In contrast to these existing methods, the proposed CCL-ASPS method adopts a two-stage approach to sample pair selection. Initially, it adaptively selects sample pairs based on node similarities within each individual network. Subsequently, it obtains additional informative samples by calculating similarities between the fused representations.

Results

This section presents a comprehensive evaluation of the proposed CCL-ASPS method. All the experiments in this study were conducted with 10 replicates, and the figures present the averaged results from these replicates. The experimental setup is first established, detailing the parameter settings and evaluation metrics employed. Subsequently, CCL-ASPS is benchmarked against baseline methods. To further validate its effectiveness, an ablation study is conducted, and the impact of different feature combination strategies is evaluated. Additionally, the learned representations are visualized at various training stages. Furthermore, this study performs case study and cold start experiments, demonstrating the ability of CCL-ASPS to discover new drug-target interactions. Finally, the time complexity of CCL-ASPS is analyzed.

Parameter setting and metrics

Key parameters of CCL-ASPS are set for optimal performance. The feature dimensions for drugs and proteins are set to 64, and the learning rate is set to 0.001. For the drug and protein graph structure feature extraction module, the number of GCN layers are both set to 1, the training epoch numbers are both set to 2000. For the collaborative contrastive learning module, the negative sample rate \(\beta\) is set to 0.8, the contrastive loss weight \(\gamma\) is set to 0.3, the dropout value is set to 0.2, the number of GAT layers is set to 2, the number of training epochs is set to 5000.

The performance of CCL-ASPS and baseline methods is evaluated with five metrics: Accuracy (ACC), Area Under the Receiver Operating Characteristic Curve (AUROC), Area Under the Precision-Recall Curve (AUPR), Matthews Correlation Coefficient (MCC), and F1-Score (F1).

Specifically, our implementation utilizes Python 3.10, with PyTorch 2.0 as the deep learning framework and PyG (PyTorch Geometric) for graph processing. Data handling and analysis were performed using NumPy and Pandas. Visualization was conducted using Matplotlib and Seaborn. The experiments were conducted on an NVIDIA GeForce RTX 2070 GPU within a Windows operating system environment. More specific implementation details with code and dataset can be obtained from 10.5281/zenodo.13329691.

Comparison with baseline methods

To demonstrate the effectiveness of CCL-ASPS, this study conducts comparative analyses with a set of established baseline methods.

GCN [37] leverages the inherent relationships within drug-target interaction networks, which makes it a valuable baseline for our task.

GAT [38] is a graph-based method that excels in capturing important relationships in graph data.

SVM [39] is a classic machine learning algorithm widely used in binary classification tasks.

RF [40] is one ensemble learning method that forms the final prediction by aggregating the predictions of multiple decision trees.

DTI-CNN [3] utilizes Random Walk with Restart and a denoising autoencoder to obtain low-dimensional vector representations of drugs and proteins.

IMCHGAN [41] leverages GAT to learn embeddings of drugs and targets from DTI heterogeneous networks, then fuses the embeddings through an attention mechanism.

DrugBAN [13] learns the joint representations of drug-protein pairs by calculating the weights between drug atoms and amino acid subsequences.

GraphDTA [42] learns drug representations based on drug molecular graphs and protein representations from protein sequences, respectively.

HyperAttentionDTI [14] learns the drug and protein subsequence embeddings from drug SMILES strings and protein sequences, respectively, and updates these representations by calculating the attention weights between them.

GraphCDR [28] applies Graph Neural Network (GNN) to extract biochemical features of drugs and cancers, and then utilizes contrastive learning to improve the generalization ability.

MSGCL [43] constructs an anchor view and a learner view based on multiple interactive networks, then applies contrastive loss to increase the consistency of features learned by the two views.

SPVec-SGCN-CPI [44] appiles SPVec to learn compound and protein features from compound SMILESs and protein sequences, respectively.

DTI-CDF [45] constructs one fused similarity matrix to build a heterogeneous DTIs graph, then extracts path-category-based multi-similarities features of drug-target pairs.

DTI-MLCD [46] extracts drug features based on molecular descriptors, molecular fingerprints and Word2vec. Meanwhile, DTI-MLCD combines three sequence-derived features to obtain protein features.

The performance of CCL-ASPS is evaluated using a five-fold cross-validation strategy. Negative samples are randomly selected from the unlabeled sample set under the positive-negative sample ratio of 1:1, while both positive and negative samples are divided into training and testing sets. All compared baseline methods are evaluated on the same dataset using their optimized hyperparameters. Figure 3 presents the ROC curve and PR curve for CCL-ASPS and the baseline methods. We split the figure of AUROC results into 2 subfigures. Each subfigure contains the ROC Curves of CCL-ASPS and seven baselines. The figure of AUPR results are also divided into 2 subgraphs according to the same rules.

Fig. 3
figure 3

The ROC and PR Curves of the proposed model and all baselines

As presented in Fig. 3, CCL-ASPS consistently outperforms all comparison baselines. Specifically, Fig. 3A and B display the AUROC metrics for CCL-ASPS and other baselines. CCL-ASPS achieves the highest AUROC value of 0.9548, followed by IMCHGAN with an AUROC of 0.9436, and DTI-CNN with an AUROC of 0.9400. Figure 3C and D illustrate the AUPR metrics for CCL-ASPS and other baselines. Among all models, CCL-ASPS achieves the best AUPR result of 0.9644, while IMCHGAN attains the second-best result with a value of 0.9518.

Comparison with baseline methods under different ratios

This experiment investigates the influence of positive-negative sample ratios on CCL-ASPS performance. Three ratios are evaluated: 1:1, 1:5, and 1:10. Negative samples are randomly selected from the unlabeled sample set at these predefined ratios.

Tables 1, 2, and 3 present the drug-target interaction prediction performance of CCL-ASPS and baseline methods under different ratios. For clarity, the best results are shown in bold, and the second-best results are underlined. This formatting scheme is applied across all tables. CCL-ASPS consistently outperforms all baselines across all five metrics at a 1:1 ratio. The results are displayed in Table 1. Meanwhile, DTI-CDF achieves the second-best results on three metrics (ACC, MCC, and F1), and IMCHGAN achieves the second-best results on the AUROC and AUPR metrics. Under a positive-negative sample ratio of 1:5 (Table 2), CCL-ASPS maintains superior performance in ACC, AUROC and AUPR metrics, while achieving the second-best results in the MCC and F1 metrics. The best MCC and F1 results are achieved by DTI-CDF and DrugBAN, respectively. For the 1:10 positive-negative sample ratio (Table 3), CCL-ASPS maintains the same trend as the 1:5 ratio, with the best results in the ACC, AUROC and AUPR metrics, and the second best results in the MCC and F1 metrics. Similarly, the best MCC and F1 results are still achieved by DTI-CDF and DrugBAN.

Table 1 The performance of CCL-ASPS and baselines under ratio 1:1
Table 2 The performance of CCL-ASPS and baselines under ratio 1:5
Table 3 The performance of CCL-ASPS and baselines under ratio 1:10

Ablation experiments

This section exploits the contributions of each component within CCL-ASPS through ablation experiments. CCL-ASPS consists of three key components: (1) drug and protein graph structure-based representations extractions, (2) collaborative contrastive learning, and (3) adaptive self-paced sampling. To assess the impact of each component, three ablation experiments are conducted: CCL-ASPS w/o GF (disables graph structure-based representations extraction), CCL-ASPS w/o CCL (disables collaborative contrastive learning), CCL-ASPS w/o ASPS (disables adaptive self-paced sampling). The adaptive self-paced sampling is part of the collaborative contrastive learning framework. To better evaluate the effectiveness of the adaptive self-paced sampling and collaborative contrastive learning, we conduct an additional experiment (CCL-ASPS w/o CCL) on CCL-ASPS without using adaptive self-paced sampling.

As shown in Table 4, disabling any module leads to a decrease in performance compared to the full CCL-ASPS model. Specifically, CCL-ASPS w/o GF exhibits a 1.4% and 1.3% decrease in AUROC and AUPR, respectively. Similarly, CCL-ASPS w/o CCL shows a decrease of 0.4% and 0.3% in AUROC and AUPR, respectively. Finally, CCL-ASPS w/o ASPS led to a decrease of 0.4% and 0.3% in AUROC and AUPR, respectively. These ablation results demonstrate that each component of CCL-ASPS contributes to its superior drug-target prediction performance.

Table 4 The ablation experiment results

The effectiveness of collaborative contrastive learning under different feature combination strategies

A critical component of CCL-ASPS is collaborative contrastive learning. This strategy aims to learn more consistent features from the drug and protein similarity networks. The hypothesis is that CCL improves model performance regardless of the chosen feature combination strategy.

To validate the hypothesis, this study evaluates the impact of CCL across various feature combination strategies: summation (sum), averaging (avg), weighted aggregation (w-agg), concatenation (concat), max pooling (max-pool), min pooling (min-pool), and convolutional neural network (CNN). The AUROC and AUPR results, as presented in Fig. 4, depict the effectiveness of CCL across various feature combination strategies. The results consistently demonstrate that models employing CCL outperform those without it across all feature combination strategies, including summation, averaging, weighted aggregation, concatenation, max pooling, min pooling, and convolutional neural network. The superior AUROC and AUPR results with CCL are achieved by the CNN strategy. This strongly supports the hypothesis and highlights the robustness of CCL in enhancing the predictive capabilities of CCL-ASPS, irrespective of the chosen feature combination strategy.

Fig. 4
figure 4

The effectiveness of collaborative contrastive learning under different feature combination strategies

Visualization and interpretation

This section employs t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize the learning process of CCL-ASPS. t-SNE projects high-dimensional feature representations into a 2D space for easier visualization. In this experiment, we showcase the ability of CCL-ASPS to distinguish features between positive and negative drug-target interaction pairs.

The learned drug and protein features are to form joint representations of drug-target pairs. Subsequently, these joint representations are compressed to two dimensions using t-SNE and visualized in a scatter plot, as displayed in Fig. 5. Blue dots represent positive drug-target pairs (labeled), while orange dots represent negative pairs (unlabeled). The figure depicts the distribution of joint representations at three training epochs: 1, 50, and 500. At epoch 1 (initial training stage), a substantial overlap between positive and negative samples is observed, indicating indistinguishable representations during this phase. As training progresses to epoch 50, a clear separation begins to emerge in the 2D space. By epoch 500, although some overlap remains, the distinction between positive and negative sample pairs is more clear. The t-SNE visualization demonstrates the capability of CCL-ASPS to progressively learn and distinguish positive and negative sample representations during training.

Fig. 5
figure 5

The visualization of positive and negative sample pairs under epoch 1, 50, 500

Case study

The case study evaluates the ability of CCL-ASPS to predict novel drug-target interactions. Due to limitations in directly validating all predictions through wet-lab experiments, a two-step approach is employed for initial validation.

This case study focuses on two drugs: pravastatin and amphetamine, because the prediction probabilities for their associated proteins were notably high. Specifically, the prediction probabilities for the top 20 proteins related to the amphetamine all exceeded 0.96. For pravastatin, the prediction probabilities for the top 20 related proteins are all above 0.83, with the top 10 predictions exceeding 0.92. These high prediction probabilities indicate strong confidence in the model to accurately predict interactions between these drugs and their related proteins.

First, known positive samples are removed from the predictions. Subsequently, the top 20 predicted interacting proteins for pravastatin and amphetamine are selected. To validate these predictions, a comprehensive literature search is conducted with published research articles. The validation results are presented in Tables 5 and 6, respectively. Each table details the specific protein, its corresponding prediction rank, and the PMID number of the validating research paper. The case study demonstrates the effectiveness of CCL-ASPS in identifying potential drug-target interactions. Notably, 19 out of the top 20 predicted interacting proteins for amphetamine are validated through the literature review. Similarly, 13 out of the top 20 predictions for pravastatin are supported by existing publications. These results highlight the capability of CCL-ASPS to screen and identify promising candidates for further investigation.

Table 5 The top 20 predicted amphetamine-associated proteins by CCL-ASPS
Table 6 The top 20 predicted pravastatin-associated proteins by CCL-ASPS

Time complexity analysis

This section analyzes the time complexity of the proposed model, focusing on three key modules: graph feature extraction, collaborative contrastive learning with adaptive self-paced sampling, and predictor.

Both drug and protein graph structure feature extractions utilize a single layer GCN, resulting in a time complexity of \(O(| V_d*n_d+V_p*n_p\vert F)\). Where \(V_d\) and \(V_p\) are the numbers of drugs and proteins respectively. \(n_d\) is the number of atoms in each drug and \(n_p\) is the number of residues in each protein. F represents the feature dimension. Since \(n_d\), \(n_p\) and F are within a fixed range, they can be treated as constants, simplifying the time complexity of this module to \(O(V_d+V_p)\).

In the collaborative contrast learning module, for similarity network learning, the time complexity is \(O(| V_d + V_p\vert \times F \times F\prime ) + O(| E_d + E_p\vert \times F\prime )\), where F and \(F\prime\) are the input and output dimensions respectively. \(E_d\) and \(E_p\) are the drug and protein similarity edges in the similarity networks. The number of drug or protein similarity networks is constant and can be omitted. The time complexity of one layer convolution operation is \(O(| V_d + V_p\vert \times (F-k)^2 \times k^2 \times \mathcal {C}_{in} \times \mathcal {C}_{out})\), where k is the kernel size, \(\mathcal {C}_{in}\) and \(\mathcal {C}_{out}\) are input and output channels respectively, which are all constants. Therefore, the time complexity of the convolution operation can be abbreviated as \(O(| V_d + V_p\vert \times F^2)\). The drug and protein contrastive learning loss requires computing the feature similarity between any two nodes, hence the time complexity is \(O(| {V_d}^2 + {V_p}^2\vert \times F)\). For the selection of contrastive learning sample pairs, we first calculate the feature similarity between all nodes, then sort them in ascending order, after which we select \({num}_t\) negative samples. Thus the time complexity of negative sample pair selection is \(O(| {V_d}^2 + {V_p}^2\vert \times F) + O(| {V_d} \log {V_d}) + O(| {V_p} \log {V_p}) = O(|{V_d}^2 + {V_p}^2\vert \times F)\). By only considering the effect of the node number, the time complexity of the collaborative contrastive learning module is \(O(| V_d + V_p\vert \times F \times F\prime ) + O(| E_d + E_p\vert \times F\prime ) + O(| V_d + V_p\vert \times F^2) + O(| {V_d}^2 + {V_p}^2\vert \times F) + O(|{V_d}^2 + {V_p}^2\vert \times F) = O(|{V_d}^2 + {V_p}^2\vert )\).

The time complexity of the predictor is \(O(| {E_{dp}}\vert \times F \times F\prime )\), where \(E_{dp}\) is the number of positive drug-target interaction pairs.

Combining the complexities of each module, the overall time complexity of CCL-ASPS is \(O(|{V_d}^2 + {V_p}^2\vert ) + O(| {E_{dp}}\vert ) = O(|{V_d}^2 + {V_p}^2\vert )\).

Additionally, we have documented the running times for CCL-ASPS and all baseline methods, which are presented in Table 7. The running time of CCL-ASPS is 1293 seconds. The running times for GCN, SVM, RF, and DTI-CNN are all under 100 seconds. The running times for GAT, GraphCDR, SPVec-SGCN-CPI, DTI-CDF, and DTI-MLCD range between 100 and 1000 seconds. The running times for IMCHGAN and DrugBAN exceed 1000 seconds but are still shorter than that of CCL-ASPS. Models such as GraphDTA, HyperAttentionDTI, and MSGCL require running times longer than CCL-ASPS, with HyperAttentionDTI having the longest running time at 3217 seconds.

The running times reveal that while CCL-ASPS requires a relatively longer training time compared to some baseline models, it is still within a reasonable and manageable range. The majority of baseline models (e.g., GCN, SVM, RF, and DTI-CNN) with significantly shorter running times often exhibit limitations in scalability and accuracy.

Furthermore, the time complexity analysis demonstrate that the running time of CCL-ASPS increases quadratically as the dataset grows. Despite this quadratic growth, the running time remains within a predictable and manageable range, ensuring that CCL-ASPS can handle larger datasets efficiently.

Table 7 The running times of CCL-ASPS and baselines

Discussion

Following the detailed experimental analysis presented in the Results section, this Discussion section further evaluates the robustness and practical utility of the proposed CCL-ASPS, focusing on three key aspects: parameter sensitivity, cold start performance, and statistical significance.

Parameter sensitivity analysis

The selection of hyperparameters is crucial for ensuring accurate and reliable prediction performance, and this section discusses the key parameters investigated along with their optimization process.

This experiment investigates the influence of three key hyperparameters: embedding size, negative sample ratio \(\beta\), and contrastive loss weight \(\gamma\). The results are presented in Fig. 6. To begin with, the proposed CCL-ASPS is evaluated with embedding sizes of 16, 32, 64, 128, and 256. The AUROC value increases from 0.931 at an embedding size of 16 to a maximum of 0.955 at 64. While the embedding size is further increased to 128 and 256, the AUROC value remains around 0.955. Similarly, the ACC and AUPR values also reach their highest at an embedding size of 64 (0.893 and 0.9641, respectively). Consequently, an embedding size of 64 is chosen for this study. Furthermore, the effectiveness of \(\beta\) in contrastive learning is analyzed. CCL-ASPS achieves the highest AUROC value when \(\beta =0.8\), as both excessively small and large values lead to a decrease in prediction performance. Additionally, the contrastive loss weight \(\gamma\) also impacts overall performance. Based on the experimental results, the optimal value for \(\gamma\) is set to 0.3.

Fig. 6
figure 6

The parameter sensitive analysis

The number of GCN layers in both the drug and protein graph structure extraction modules is set to 1. Grid search is employed to determine this value. Similarly, the GAT layers number for the collaborative contrastive learning module is set to 2. A dropout rate of 0.2 is chosen for the GAT layer in the collaborative contrastive learning module. The training epochs are set to 2000, 2000, and 5000 for the drug graph structure feature extraction module, protein graph structure feature extraction module, and collaborative contrastive learning module, respectively. The learning rates of these three modules are all set to 0.001, which is selected from [0.1, 0.01, 0.001, 0.0001].

The parameter sensitivity analysis highlights that careful tuning of hyperparameters is essential to maximize the performance of CCL-ASPS. Future research could further enhance model effectiveness by employing adaptive or automated hyperparameter optimization techniques.

Cold start

This section discusses the performance of CCL-ASPS under a cold start scenario, where the test set comprises drugs or proteins absent from the training data.

The K-fold cross-validation is employed to assess the model performance. In the context of drug cold start evaluation, drugs are first divided into k folds. During each iteration (fold), one subset of drugs with their associated interactions is designated as the test set. The remaining drugs in the k-1 subsets with their associated interactions constitute the positive samples for training. The negative samples are then randomly selected from all possible drug-target pairs between the drugs in the k-1 subsets and all targets.

This ensures the test set excludes drugs present in the training set, enabling a systematic evaluation of the model’s response to unseen drugs. An identical k-fold cross-validation strategy is employed for protein cold start evaluation, ensuring the generalizability of CCL-ASPS to unseen proteins.

Figure 7 presents the prediction performance of CCL-ASPS for drug and protein cold start with varying k values. As observed, both AUROC and AUPR scores exhibit a positive correlation with increasing k. For drug cold start, the AUROC score rises from 0.8939 (k=2) to 0.9242 (k=20), and the AUPR score increases from 0.9118 (k=2) to 0.9261 (k=20). A similar trend is observed in protein cold start. These observed trends are likely attributable to the increasing size and diversity of the training set with larger k values. A larger training dataset provides the model with a broader range of examples to learn from, consequently enhancing its capability for generalizing to unseen drugs and proteins.

Fig. 7
figure 7

The AUROC and AUPR results of drug and protein cold start experiments

The cold start analysis demonstrates the strong generalization capability of CCL-ASPS with unseen drugs or proteins. Future work could further improve the performance by incorporating additional biological knowledge, such as drug-target binding affinities or 3D structures of drug, to provide a more comprehensive understanding of the interactions.

Statistic analysis

The experimental results demonstrate that our model, CCL-ASPS, significantly outperforms all baseline methods in predicting drug-target interactions. In this discussion, we conducted a comprehensive statistical analysis to compare the performance of CCL-ASPS against all baseline methods. The objective of this experiment is to analyze whether the observed improvements in performance are statistically significant.

The performance of each baseline is assessed through ten independent runs. To evaluate the statistical significance of the performance improvements, we employ paired samples t-tests on the AUROC metric values. We use a significance level of 0.05 to determine the statistical significance. A p-value less than 0.05 indicates strong evidence against the null hypothesis, suggesting that the observed improvements in model performance are not due to random chance. The results of the paired t-tests are detailed in Fig. 8. CCL-ASPS outperforms all baseline methods with p-values less than 0.05, indicating that the improvements in model performance are statistically significant.

Fig. 8
figure 8

The statistic results between CCL-ASPS and all baselines on metric AUROC

The statistical significance analysis provides strong evidence that the performance improvements achieved by CCL-ASPS over baseline methods are substantial, demonstrating its superior predictive capabilities in drug-target interaction tasks.

In summary, this discussion focuses on three key aspects: parameter sensitivity, cold start performance, and statistical significance of CCL-ASPS. First, the parameter sensitivity analysis highlights the importance of fine-tuning hyperparameters such as embedding size, negative sample ratio, and contrastive loss weight to optimize the performance of CCL-ASPS. Second, the cold start analysis demonstrates the effectiveness of CCL-ASPS in handling unseen drugs or proteins. Third, the statistical significance analysis confirms that the performance gains of CCL-ASPS over other baseline models are statistically significant and unlikely due to random chance. Despite these strengths, certain limitations remain. For instance, integrating adaptive hyperparameter optimization techniques could further enhance model robustness and reduce manual effort. Furthermore, while the model performs well on the cold start experiment, its generalizability to other drug-target interaction datasets has yet to be explored.

Conclusions

This study presents CCL-ASPS, a novel approach that leverages collaborative contrastive learning and an adaptive self-paced sampling strategy to learn consistent representations from multiple networks.

Experimental results demonstrate the superior performance of CCL-ASPS in predicting drug-target interactions compared to baseline methods. Furthermore, the ablation experiment validates the effectiveness of each model component, while the exploration of various feature combination strategies confirms the efficacy of CCL. Moreover, the case study and cold start evaluation showcase the capability of CCL-ASPS to predict potential drug-target interactions. Finally, the statistic analysis further confirms the superior performance of CCL-ASPS.

Despite its contributions, this work presents opportunities for further exploration. First, investigating more sophisticated learning strategies within the structure-based feature extraction component holds promise for further performance improvements. Secondly, the proposed CCL-ASPS could be applied to predict associations among other diverse biological entities, such as drug-disease interaction and protein-protein interaction.

Methods

This section details the components of the proposed CCL-ASPS. The utilized dataset and data preprocessing are presented first. Subsequently, the components of CCL-ASPS are introduced: graph structure-based feature extraction, collaborative contrastive learning with adaptive self-paced sampling, and the final predictor. The source code and dataset for CCL-ASPS are publicly available at 10.5281/zenodo.13329691.

Dataset and preprocessing

This study validates all innovations with one drug-target interaction dataset collected from multiple data sources. Interaction networks related to drugs and proteins are obtained from Luo [17]. SMILES strings of drugs are collected from DrugBank [47]. Protein amino acid sequences are retrieved from Uniport [48]. Protein 3D structures are acquired from the RCSB Protein Data Bank [49]. For proteins lacking PDB structures, predicted structures from AlphaFold [50] are utilized. As shown in Table 8, the final dataset contains 12015 nodes, including 708 drugs, 1512 proteins, 5603 diseases, and 4192 side effects. Furthermore, six types of interactions are collected, including 1923 drug-protein interactions (Dr-Pr), 199214 drug-disease associations (Dr-Di), 10036 drug-drug interactions (Dr-Dr), 80164 drug-side-effect associations (Dr-Se), 1923 protein-drug interactions (Pr-Dr), 1596745 protein-disease associations (Pr-Di), and 7363 protein-protein interactions (Pr-Pr).

Table 8 Statistics of the collected data

Afterward, the drug SMILES strings are converted into atom interaction graphs. Each atom feature is initialized based on its chemical properties [51]. For proteins, the coordinates of amino acid residues are represented by their \(C_{\alpha }\) atoms. The distances between residues are calculated. Residues are considered in contact if their distance is below a specified threshold, which forms the amino acid residue interaction graphs. Additionally, seven attributes of amino acid residues are utilized for initial representations [52].

Jaccard similarity [53] is employed to construct similarity networks based on each interaction and association network. Additionally, structure similarity networks are built based on SMILES strings and amino acid sequences using methods described in Luo [17].

Drug graph structure-based feature

This section describes the extraction of drug representations from atom interaction graphs using Graph Convolutional Network (GCN) [37].

For each drug \(d_i\), its initial atom representations are denoted as \(X_{d_i} \in \mathbb {R}^{n_{d_{i}}\times k_{d}}\), where \(n_{d_{i}}\) and \(k_{d}\) represents the number of atoms in drug \(d_{i}\) and the feature dimension of atoms, respectively. Additionally, matrix \(A_{d_{i}} \in \mathbb {R}^{n_{d_{i}}\times n_{d_{i}}}\) denotes the adjacency matrix between atoms, where \(A_{d_{i}}=1\) means there is an edge between two atoms and \(A_{d_{i}}=0\) otherwise.

GCN updates the atom features by aggregating information from neighboring atoms, as defined by the following equation:

$$\begin{aligned} X_{d_i}^\prime =\sigma \left( D_{d_{i}}^{-\frac{1}{2}} \check{A}_{d_{i}} D_{d_{i}}^{-\frac{1}{2}} X_{d_i} W_{d_{1}}\right) , \end{aligned}$$
(1)

where \(\check{A}_{d_{i}} = A_{d_{i}} + I\) is the adjacency matrix with self-loop, \(I\in \mathbb {R}^{n_{d_{i}}\times n_{d_{i}}}\) is the identity matrix, \(D_{d_{i}} \in \mathbb {R}^{n_{d_{i}}\times n_{d_{i}}}\) is the degree matrix with the value of each diagonal element equal to the degree of the corresponding atom node, \(W_{d_{1}} \in \mathbb {R}^{k_{d} \times f_{d}}\) is a learnable weight parameter and \(f_{d}\) is the output feature dimension of atoms, \(\sigma \left( \cdot \right)\) is the nonlinear activation function.

Following the update of features, a readout function is employed to combine all the atom features and generate the drug representation.

$$\begin{aligned} h_{d_i} ={Readout}\left( X_{d_i}^\prime \right) , \end{aligned}$$
(2)

where \(h_{d_i} \in \mathbb {R}^{f_{d}}\) is the representation of drug \(d_i\). In this study, all \({Readout\left( \cdot \right) }\) functions are implemented using the global mean pooling operation.

To pre-train the GCN for drug feature extraction, a binary cross-entropy loss (BCELoss) is employed as the objective function, aiming to predict drug-drug interactions. This involves learning the joint representations of the drug-drug pairs with a CNN layer:

$$\begin{aligned} {h_{(d_i, d_j)}} = {Pooling}\left( {Conv}\left( h_{d_i}\Vert h_{d_j}\right) \right) , \end{aligned}$$
(3)

where \(h_{d_i}\) and \(h_{d_j}\) are the representations of drug \(d_{i}\) and \(d_{j}\), respectively. The symbol \(\Vert\) denotes the concatenation operation. \({Conv\left( \cdot \right) }\) represents the CNN layer with a kernel size of 3, an out channel of 4, padding and step size of 1. \({Pooling\left( \cdot \right) }\) represents the max pooling operation. The resulting \({ h_{(d_i, d_j)}\in \mathbb {R}^{4f_{d}}}\) is the joint representation of drug \(d_{i}\) and \(d_{j}\).

A multilayer perceptron is applied to predict the association probability of drug pairs:

$$\begin{aligned} {P_{(d_i, d_j)}} =\sigma \left( W_{d_{3}} \left( {ReLu} \left( W_{d_{2}} {h_{(d_i, d_j)}}\right) \right) \right) , \end{aligned}$$
(4)

where \(W_{d_{2}} \in \mathbb {R}^{4f_{d} \times f_{d}}\) and \(W_{d_{3}} \in \mathbb {R}^{f_{d} \times 1}\) are learnable parameters, \(\sigma \left( \cdot \right)\) and \({ReLu\left( \cdot \right) }\) are nonlinear activation functions, and \({P_{(d_i, d_j)}}\) is the predicted association probability between drug \(d_i\) and \(d_j\).

The BCELoss serves as the objective function for drug feature extraction:

$$\begin{aligned} \mathcal {L}_{d} =-\sum \limits _{(d_i,d_j) \in ddp} {\widehat{P}_{(d_i, d_j)}} \log {{P}_{(d_i, d_j)}} + (1-{\widehat{P}_{(d_i, d_j)}}) \log (1-{P_{(d_i, d_j)}}), \end{aligned}$$
(5)

where ddp denotes a set of drug-drug pairs. \((d_i,d_j)\) represents a single pair of drug \(d_i\) and \(d_j\). \({P_{(d_i, d_j)}}\) represents the predicted association probability of drug pair \((d_i,d_j)\). \({\widehat{P}_{(d_i, d_j)}}\) signifies the ground truth for drug-drug association, where \({\widehat{P}_{(d_i, d_j)}=1}\) for true drug-drug association and \({\widehat{P}_{(d_i, d_j)}=0}\) otherwise.

Protein graph structure-based feature

Similar to drug feature extraction, a GCN-based approach is employed to extract protein features from the amino acid residue interaction graphs. The amino acid residue interaction graph for protein \(p_i\) is represented by an adjacency matrix \(A_{p_i} \in \mathbb {R}^{n_{p_i}\times n_{p_i}}\), where \(n_{p_i}\) denotes the number of residues in the protein \(p_i\). Each row in the initial feature matrix \(X_{p_i} \in \mathbb {R}^{n_{p_i}\times k_{p}}\) represents the initial features of an amino acid residue. \(k_{p}\) is the initial feature dimension of proteins.

A single-layer GCN is applied to update the residue representations, aggregating information from neighboring residues:

$$\begin{aligned} X_{p_i}^\prime ={ReLu}\left( \ W_{p_2} \left( D^{-\frac{1}{2}}_{p_i} (A_{p_i} + I_{p_i}) D^{-\frac{1}{2}}_{p_i} X_{p_i} W_{p_{1}}\right) \right) , \end{aligned}$$
(6)

where \(I_{p_i} \in \mathbb {R}^{n_{p_i}\times n_{p_i}}\) is the identity matrix. \(D_{p_i} \in \mathbb {R}^{n_{p_i}\times n_{p_i}}\) is a diagonal matrix with the corresponding values are nodes degree. \(W_{p_{1}}\in \mathbb {R}^{k_{p} \times f_{p}}\) and \(W_{p_{2}}\in \mathbb {R}^{f_{p} \times f_{p}}\) are learnable parameters. \(f_{p}\) is the output feature dimension of amino acid residue. \(ReLu\left( \cdot \right)\) is an activation function. \(X_{p_i}^\prime \in \mathbb {R}^{n_{p_i}\times f_{p}}\) is the updated amino acid residue feature matrix.

Following the update, a self-attention pooling layer is employed to extract feature representations of key residues. Subsequently, a readout function is applied to obtain the protein representation for downstream tasks:

$$\begin{aligned} h_{p_i} ={Readout}\left( {SAGPooling}\left( X_{p_i}^\prime \right) \right) , \end{aligned}$$
(7)

where \(SAGPooling\left( \cdot \right)\) is the self-attention pooling. \(h_{p_i}\in \mathbb {R}^{f_{p}}\) is the representation of protein \(p_i\).

To pre-train the GCN for protein feature extraction, the binary cross-entropy loss is applied as the objective function, aiming to predict protein-protein interactions (PPIs). The pre-training process is analogous to drug feature extraction. The implementation is the same as in Eqs. (3, 4, 5):

$$\begin{aligned} {h_{(p_i, p_j)}} ={Pooling}\left( {Conv}\left( h_{p_i}\Vert h_{p_j}\right) \right) , \end{aligned}$$
(8)
$$\begin{aligned} {P_{(p_i, p_j)}} =\sigma \left( W_{p_{3}} {ReLu} \left( W_{p_{2}} {h_{(p_i, p_j)}}\right) \right) , \end{aligned}$$
(9)
$$\begin{aligned} \mathcal {L}_{p} =-\sum \limits _{(p_i,p_j) \in ppi} {\widehat{P}_{(p_i,p_j)}} \log {{P}_{(p_i, p_j)}} + (1-{\widehat{P}_{(p_i, p_j)}}) \log (1-{P_{(p_i, p_j)}}), \end{aligned}$$
(10)

where \(h_{p_i}\) and \(h_{p_j}\) are representations of protein \(p_i\) and \(p_j\), respectively. \(h_{p_{ij}}\in \mathbb {R}^{4 \times f_{p}}\) is the joint representation of protein \(p_i\) and \(p_j\). \(W_{p_{2}} \in \mathbb {R}^{4f_{p} \times f_{p}}\) and \(W_{p_{3}} \in \mathbb {R}^{f_{p} \times f_{p}}\) are learnable parameters. \({P_{(p_i, p_j)}}\) is the predicted interaction probability between protein \(p_i\) and \(p_j\). \({\widehat{P}_{(p_i, p_j)}}\) is the ground truth, with \({\widehat{P}_{(p_i, p_j)}=1}\) indicating an interaction between two proteins and \({\widehat{P}_{(p_i, p_j)}=0}\) otherwise.

Collaborative contrastive learning with adaptive self-paced sampling

This section describes the proposed collaborative contrastive learning and adaptive self-paced sampling strategy, illustrated in Fig. 2B and C. The strategy consists of three key components: multiple network learning, collaborative contrastive learning, and adaptive self-paced sampling. The overall workflow is outlined in Algorithm 1.

figure a

Algorithm 1 CCL-ASPS

Multiple network learning

This section introduces the first component: multiple network learning. GAT [38] is employed to leverage information hidden within each drug and protein similarity network.

With each similarity network \(S^m,\left( m=1,2,\ldots , n_{m_d}\right)\), for each drug \(d_i\) and its pre-trained representation \(h_{d_i}\), GAT computes normalized attention scores to its neighbors. The attention score \(\mathcal {\alpha }^{l}_{ij}\) for neighbor \(d_j\) at layer l is calculated as:

$$\begin{aligned} \mathcal {\alpha }^{l}_{ij} ={softmax} \left( \frac{e^{l}_{ij}}{\sum \nolimits _{{{d_k}} \in \mathcal {N}(i)} e^{l}_{ik}}\right) , \end{aligned}$$
(11)

where \(e^{l}_{ij}\) is the unnormalized attention score between drug \(d_i\) and \(d_j\) at layer l. \(\mathcal {N}(i)\) denotes the set neighbors for drug \(d_i\).

The unnormalized attention score \(e^{l}_{ij}\) is defined as:

$$\begin{aligned} e^{l}_{ij} = {a}^{l}\left( {LeakyReLu} ( W^{l}_1 h^{l-1}_{d_i} \Vert W^{l}_2 h^{l-1}_{d_j}) \right) , \end{aligned}$$
(12)

where \(W^{l}_1 \in \mathbb {R}^{f_{d}\times f_{d}}\) and \(W^{l}_2 \in \mathbb {R}^{f_{d}\times f_{d}}\) are learnable weight matrices at layer l. \({a^{l}} \in \mathbb {R}^{f_{d}}\) is a learnable parameter vector at layer l.

After computing the normalized attention scores \(\mathcal {\alpha }^{l}_{ij}\) for all neighbors, the GAT layer aggregates the features of neighboring drugs in a weighted manner to update the representation of the target drug \(d_i\):

$$\begin{aligned} h^{l}_{d_i} =\sigma \left( \sum \limits _{d_j \in \mathcal {N}(i)}{ \alpha ^{l}_{ij} W^{l}_3 h^{l-1}_{d_j}} \right) , \end{aligned}$$
(13)

where \(h^{l}_{d_i}\) denotes the output representation at layer l, with \(h^{0}_{d_i} = h_{d_i}\). \(W^{l}_3 \in \mathbb {R}^{f_{d}\times f_{d}}\) is the weight matrix in layer l.

The output of the final layer is considered the final representation learned from the similarity network \(S^m\), denoted as \(h^{S^m}_{d_i}\).

Similar to drug representation learning, protein features can be extracted from each protein similarity network. The learned feature of protein \(p_i\) under the protein similarity network \(S^m\) is denoted as \(h^{S^m}_{p_i}\).

Collaborative contrastive learning loss

After extracting features from all similarity networks, a convolutional neural network is employed to fuse the features of both drugs and proteins. The equation for drug feature fusion is as follows:

$$\begin{aligned} H_{d_i} ={Conv} \left( \prod _{m=1}^{n_{m_d}} {h^{S^m}_{d_i}} \right) , \end{aligned}$$
(14)

where \(\prod\) is the concatenate operation. \({Conv\left( \cdot \right) }\) is a 1D convolution operation with an output channel of 16 and kernel size of 3. \(h^{S^m}_{d_i}\) is the feature of drug \(d_i\) learned from the m-th similarity network. \(H_{d_i}\) is the fused representation of drug \(d_i\). An equivalent process is employed to obtain the fused representation of protein \(p_i\), denoted as \(H_{p_i}\).

Instead of calculating the contrastive loss for every pair of network outputs, this approach contrasts the features learned from individual networks with the fused features. This strategy encourages the representations learned from individual networks to be more consistent with the fused representations. For drugs, the contrastive learning loss aims to minimize the distance between each \(h^{S^m}_{d_i},(m=1,2,\ldots ,{n_{m_d}})\) and the fused feature \(H_{d_i}\). The loss function is defined as follows:

$$\begin{aligned} \mathcal {L}^{m}_{d} =\sum \limits _{i=1}^{n_d} -\log \frac{\sum \nolimits _{d_j \in P^{m}_{d_i}}\exp ({sim}( h^{S^m}_{d_i}, H_{d_j}) / \tau )}{\sum \nolimits _{d_j \in (P^{m}_{d_i} \cup N^{m}_{d_i})} \exp ({sim} (h^{S^m}_{d_i}, H_{d_j}) / \tau )} , \end{aligned}$$
(15)

where \(n_d\) is the number of drugs. \(P^{m}_{d_i}\) and \(N^{m}_{d_i}\) are the positive and negative sample sets for drug \(d_i\) in drug similarity network \({S^m}\), respectively. \(h^{S^m}_{d_i}\) is the feature of drug \(d_i\) learned from \({S^m}\). \(H_{d_i}\) is the fused feature of drug \(d_i\). \({sim}\left( \cdot \right)\) is the cosine similarity and \(\tau\) is the temperature parameter.

An identical loss function is applied for protein \(p_i\) to minimize the distance between the feature \(h^{S^m}_{p_i}\) learned from m-th similarity network and the fused feature \(H_{p_i}\). The equation for the protein contrastive loss is defined as follows:

$$\begin{aligned} \mathcal {L}^{m}_{p} =\sum \limits _{i=1}^{n_p} -\log {\frac{\sum \nolimits _{p_j \in P^{m}_{p_i}}\exp ({sim}( h^{S^m}_{p_i}, H_{p_j}) / \tau )}{\sum \nolimits _{p_j \in (P^{m}_{p_i} \cup N^{m}_{p_i})} \exp ({sim} (h^{S^m}_{p_i}, H_{p_j}) / \tau ) }}, \end{aligned}$$
(16)

where \(n_p\) is the proteins number, \(P^{m}_{p_i}\) and \(N^{m}_{p_i}\) are the positive and negative sample sets of protein \(p_i\), respectively.

The contrastive losses from each individual similarity network are combined to compute the overall contrastive loss \(\mathcal {L}^{c}\), as shown in the following equation:

$$\begin{aligned} \mathcal {L}^{c} =\sum \limits _{m=1}^{n_{m_d}} (\mathcal {L}^{m}_{d}) + \sum \limits _{m=1}^{n_{m_p}} (\mathcal {L}^{m}_{p}), \end{aligned}$$
(17)

where \({n_{m_d}}\) and \({n_{m_p}}\) represent the drug and protein similarity network numbers, respectively.

Adaptive self-paced sampling

The selection of contrastive learning samples plays a pivotal role in the effectiveness of contrastive learning. This work follows a common approach where features of the same node across different views are considered positive samples. For instance, the positive sample set \(P^{m}_{p_i}\) for protein \(p_i\) in network network \(S^m\) only includes \(p_i\) itself.

Unlike previous works that treat all non-positive node pairs as negative samples, this study introduces the adaptive self-paced sampling strategy to identify more informative negative samples. The details of this strategy are illustrated in Fig. 2C.

After removing positive samples, each protein \(p_i\) has \(n_p-1\) candidate negative samples. To select more informative negative samples, this work employs two scoring functions to assess their reliabilities:

$$\begin{aligned} {R}^{m}_{ij} = 1-S^{m}_{ij}, \end{aligned}$$
(18)
$$\begin{aligned} {R}^{G}_{ij} = 1-{sim}(p_i,p_j), \end{aligned}$$
(19)

where \({R}^{m}_{ij}\) is the network-specific reliability, reflecting the dissimilarity between proteins \(p_i\) and \(p_j\) within similarity network \(S^m\). \({R}^{G}_{ij}\) is the global feature reliability, calculated based on the cosine similarity between the fused feature representations of \(p_i\) and \(p_j\). \({sim}(p_i,p_j)\) is the cosine similarity between the fused features of protein \(p_i\) and \(p_j\), which is defined as follows:

$$\begin{aligned} {sim}(p_i,p_j) = \frac{ H_{p_i} \cdot H_{p_j} }{\Vert H_{p_i}\Vert \Vert H_{p_j}\Vert } , \end{aligned}$$
(20)

where \(\left( \cdot \right)\) represents the dot product, \(\Vert \cdot \Vert\) is the euclidean norm operation.

A self-paced sampling strategy is employed to dynamically select informative negative samples during training. In each iteration (epoch) t, the number of negative samples \(num_t\) is determined using the following equation:

$$\begin{aligned} {num}_t = \lfloor (n_p-1) \beta \frac{t}{T} \rfloor , \end{aligned}$$
(21)

where T is the maximum number of training epochs. t is the current training epoch. \(\beta\) is the hyperparameter controlling the ratio of the negative sample size to the candidates. \(n_p\) is the number of proteins. \(\lfloor \cdot \rfloor\) represents the rounds down operation.

The self-paced sampling strategy prioritizes high-informative negative samples throughout training. Initially, the selection focuses on the most reliable negative samples within each similarity network \(S^m\). The candidate samples are sorted based on their network-specific reliability scores \(\text {R}^{m}_{ij}\). The \(num_t\) most reliable candidates are then chosen to form the network-specific negative sample set \(N^{ns}\).

In parallel, the strategy feature-based negative sample set \(N^{fs}\) by leveraging global feature reliability scores \({R}^{G}_{ij}\). Finally, the negative sample set for each contrastive learning is formed by combining the two sets using the union operation, denoted as \(N^{m}=N^{ns} \cup N^{fs}\).

The candidate negative samples at each similarity network are sorted by their network-specific reliability scores \(\text {R}^{m}_{ij}\) in descending order.

Prediction

This section describes the prediction of drug-target interactions. First, CCL-ASPS removes the positive drug-target interactions from all possible drug-target pairs. Then, it generates negative samples by randomly sampling from the remaining unlabeled drug-target pairs. The selected negative samples and the removed positive samples are combined for training. Afterward, a convolutional neural network is employed to extract the joint representations of drug-target pairs:

$$\begin{aligned} {H_{{(d_i, p_j)}}} ={Pooling}\left( {Conv}\left( H_{d_i}\Vert H_{p_j}\right) \right) , \end{aligned}$$
(22)

where \(H_{d_i}\) and \(H_{p_j}\) are the fused representation of drug \(d_i\) and protein \(p_j\) respectively. \({H_{{(d_i, p_j)}}}\in \mathbb {R}^{f_{dp}}\) is the joint representation of drug \(d_i\) and protein \(p_j\).

The extracted joint representation is then fed into a multilayer perceptron to estimate the interaction probability between drug \(d_i\) and protein \(p_j\):

$$\begin{aligned} p_{(d_i, p_j)} =\sigma \left( W_{{dp}_{2}} {ReLu} \left( W_{{dp}_{1}} {H_{{(d_i, p_j)}}}\right) \right) , \end{aligned}$$
(23)

where \(W_{{dp}_{2}}\in \mathbb {R}^{f_{dp}\times 1}\) and \(W_{{dp}_{1}}\in \mathbb {R}^{4f_{dp}\times f_{dp}}\) are the learnable weight matrices. \(\sigma \left( \cdot \right)\) is the sigmoid activation function.

The binary cross-entropy loss is applied as the objective function:

$$\begin{aligned} \mathcal {L}_{dp} =-\sum \limits _{(d_i,p_j) \in dpp} {\widehat{p}_{{(d_i, p_j)}}} \log {p_{{(d_i, p_j)}}} + (1-{\widehat{p}_{{(d_i, p_j)}}}) \log (1-{p_{{(d_i, p_j)}}}), \end{aligned}$$
(24)

where \({p_{{(d_i, p_j)}}}\) is the predicted interaction probabilitiy between drug \(d_{i}\) and protein \(p_{j}\). \({\widehat{p}_{{(d_i, p_j)}}}\) is the ground truth label. dpp is the training set of drug-protein pairs. \(\mathcal {L}_{dp}\) is the prediction loss.

The final loss function incorporates both the contrastive learning loss \(\mathcal {L}^c\) introduced earlier and the DTI prediction loss \(\mathcal {L}_{dp}\):

$$\begin{aligned} \mathcal {L} = \gamma \mathcal {L}^c + (1-\gamma )\mathcal {L}_{dp} , \end{aligned}$$
(25)

where \(\gamma\) is a hyperparameter that controls the relative weight of each loss.

Availability of data and materials

All data generated or analyzed during this study are included in this published article, its supplementary information files and publicly available repositories. In addition, all code and data can be obtained from https://doi.org/10.5281/zenodo.13329691 [54].

Data availability

Implementation details with code and dataset can be obtained from https://doi.org/10.5281/zenodo.13329691.

Abbreviations

DTI:

Drug-target interaction

CCL:

Collaborative contrastive learning

ASPS:

Adaptive self-paced sampling stratege

CL:

Contrastive learning

MLP:

Multilayer perceptron

SAFs:

Self-associated features

AAFs:

Adjacent-associated features

ACC:

Accuracy

AUROC:

Area under the receiver operating characteristic curve

AUPR:

Area under the precision-recall curve

MCC:

Matthews correlation coefficient

F1:

F1-score

GNN:

Graph neural network

CNN:

Convolutional neural network

t-SNE:

T-distributed stochastic neighbor embedding

BCELoss:

Binary cross-entropy loss

PPIs:

Protein-protein interactions

References

  1. Feng Y, Wang Q, Wang T, et al. Drug target protein-protein interaction networks: a systematic perspective. Biomed Res Int. 2017;2017(1):1289259.

    PubMed  PubMed Central  Google Scholar 

  2. Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, et al. Drug-target interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17(4):696–712.

    Article  CAS  PubMed  Google Scholar 

  3. Peng J, Li J, Shang X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinformatics. 2020;21(13):1–13.

    Google Scholar 

  4. Xia X, Zhu C, Zhong F, Liu L. MDTips: a multimodal-data-based drug-target interaction prediction system fusing knowledge, gene expression profile, and structural data. Bioinformatics. 2023;39(7):btad411.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lin S, Wang Y, Zhang L, Chu Y, Liu Y, Fang Y, et al. MDF-SA-DDI: predicting drug-drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Brief Bioinform. 2022;23(1):bbab421.

    Article  PubMed  Google Scholar 

  6. Wei L, Ye X, Sakurai T. ToxinMI: improving peptide toxicity prediction by fusing multimodal information based on mutual information. In: Proc Conf Res Adapt Converg Syst. 2022. pp. 77–82.

  7. Zhang H, Cui H, Zhang T, Cao Y, Xuan P. Learning multi-scale heterogenous network topologies and various pairwise attributes for drug-disease association prediction. Brief Bioinform. 2022;23(2):bbac009.

    Article  PubMed  Google Scholar 

  8. Yang Y, Sun Y, Li F, Guan B, Liu JX, Shang J. MGCNRF: Prediction of Disease-Related miRNAs Based on Multiple Graph Convolutional Networks and Random Forest. IEEE Trans Neural Netw Learn Syst. 2023.

  9. Gao Z, Ma H, Zhang X, Wang Y, Wu Z. Similarity measures-based graph co-contrastive learning for drug-disease association prediction. Bioinformatics. 2023;39(6):btad357.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y. Graph contrastive learning with augmentations. Adv Neural Inf Process Syst. 2020;33:5812–23.

    Google Scholar 

  11. Cheng Z, Zhao Q, Li Y, Wang J. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics. 2022;38(17):4153–61.

    Article  CAS  PubMed  Google Scholar 

  12. Song T, Zhang X, Ding M, Rodriguez-Paton A, Wang S, Wang G. DeepFusion: A deep learning based multi-scale feature fusion method for predicting drug-target interactions. Methods. 2022;204:269–77.

    Article  CAS  PubMed  Google Scholar 

  13. Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat Mach Intell. 2023;5(2):126–36.

    Article  Google Scholar 

  14. Zhao Q, Zhao H, Zheng K, Wang J. HyperAttentionDTI: improving drug-protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics. 2022;38(3):655–62.

    Article  CAS  PubMed  Google Scholar 

  15. Ru X, Zou Q, Lin C. Optimization of drug-target affinity prediction methods through feature processing schemes. Bioinformatics. 2023;39(11):btad615.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Yang X, Niu Z, Liu Y, Song B, Lu W, Zeng L, et al. Modality-DTA: Multimodality Fusion Strategy for Drug-Target Affinity Prediction. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(2):1200–10.

    Article  CAS  PubMed  Google Scholar 

  17. Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhang S, Yang K, Liu Z, Lai X, Yang Z, Zeng J, et al. DrugAI: a multi-view deep learning model for predicting drug-target activating/inhibiting mechanisms. Brief Bioinform. 2023;24(1):bbac526.

    Article  PubMed  Google Scholar 

  19. Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, et al. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun. 2021;12(1):6775.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Xiong Z, Liu S, Huang F, Wang Z, Liu X, Zhang Z, et al. Multi-relational contrastive learning graph neural network for drug-drug interaction event prediction. In: Proc AAAI Conf Artif Intell. vol. 37. 2023. pp. 5339–47.

  21. Long Y, Wu M, Liu Y, Fang Y, Kwoh CK, Chen J, et al. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics. 2022;38(8):2254–62.

    Article  CAS  PubMed  Google Scholar 

  22. Wang H, Huang F, Xiong Z, Zhang W. A heterogeneous network-based method with attentive meta-path extraction for predicting drug-target interactions. Brief Bioinform. 2022;23(4):bbac184.

    Article  PubMed  Google Scholar 

  23. Wu Z, Xiong Y, Yu SX, Lin D. Unsupervised feature learning via non-parametric instance discrimination. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 3733–42.

  24. Ye M, Zhang X, Yuen PC, Chang SF. Unsupervised embedding learning via invariant and spreading instance feature. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 6210–9.

  25. Ji X, Henriques JF, Vedaldi A. Invariant information clustering for unsupervised image classification and segmentation. In: Proc IEEE/CVF Int Conf Comput Vis. 2019. pp. 9865–74.

  26. Wei C, Liang J, Liu D, Wang F. Contrastive Graph Structure Learning via Information Bottleneck for Recommendation. Adv Neural Inf Process Syst. 2022;35:20407–20.

    Google Scholar 

  27. Wang Y, Wang J, Cao Z, Barati Farimani A. Molecular contrastive learning of representations via graph neural networks. Nat Mach Intell. 2022;4(3):279–87.

    Article  Google Scholar 

  28. Liu X, Song C, Huang F, Fu H, Xiao W, Zhang W. GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform. 2022;23(1):bbab457.

    Article  PubMed  Google Scholar 

  29. Chu G, Wang X, Shi C, Jiang X. CuCo: Graph Representation with Curriculum Contrastive Learning. In: IJCAI. 2021. pp. 2300–6.

  30. Zhao X, Wu J, Zhao X, Yin M. Multi-view contrastive heterogeneous graph attention network for lncRNA-disease association prediction. Brief Bioinform. 2023;24(1):bbac548.

    Article  PubMed  Google Scholar 

  31. Qu Y, He C, Yin J, Zhao Z, Chen J, Duan L. MOVE: Integrating Multi-source Information for Predicting DTI via Cross-view Contrastive Learning. In: IEEE Int Conf Bioinformatics Biomed (BIBM). IEEE; 2022. pp. 535–40.

  32. Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L. Graph contrastive learning with adaptive augmentation. In: Proc Web Conf 2021. 2021. pp. 2069–80.

  33. Luo X, Ju W, Qu M, Chen C, Deng M, Hua XS, et al. Dualgraph: Improving semi-supervised graph classification via dual contrastive learning. In: IEEE 38th Int Conf Data Eng (ICDE). IEEE; 2022. pp. 699–712.

  34. Wang X, Liu N, Han H, Shi C. Self-supervised heterogeneous graph neural network with co-contrastive learning. In: Proc 27th ACM SIGKDD Conf Knowl Discov Data Min. 2021. pp. 1726–36.

  35. Li Y, Qiao G, Gao X, Wang G. Supervised graph co-contrastive learning for drug-target interaction prediction. Bioinformatics. 2022;38(10):2847–54.

    Article  CAS  PubMed  Google Scholar 

  36. Wang X, Chen Y, Zhu W. A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell. 2021;44(9):4555–76.

    Google Scholar 

  37. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:160902907. 2016.

  38. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv preprint arXiv:171010903. 2017.

  39. Cao DS, Liu S, Xu QS, Lu HM, Huang JH, Hu QN, et al. Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. Anal Chim Acta. 2012;752:1–10.

    Article  CAS  PubMed  Google Scholar 

  40. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  41. Li J, Wang J, Lv H, Zhang Z, Wang Z. IMCHGAN: inductive matrix completion with heterogeneous graph attention networks for drug-target interactions prediction. IEEE/ACM Trans Comput Biol Bioinform. 2021;19(2):655–65.

    Article  Google Scholar 

  42. Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: Predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7.

    Article  CAS  PubMed  Google Scholar 

  43. Ruan X, Jiang C, Lin P, Lin Y, Liu J, Huang S, et al. MSGCL: inferring miRNA-disease associations based on multi-view self-supervised graph structure contrastive learning. Brief Bioinform. 2023;24(2):bbac623.

    Article  PubMed  Google Scholar 

  44. Zhang Y, Li J, Lin S, Zhao J, Xiong Y, Wei DQ. An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model. J Cheminform. 2024;16(1):67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, et al. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform. 2019;22(1):451–62.

    Article  Google Scholar 

  46. Chu Y, Shan X, Chen T, Jiang M, Wang Y, Wang Q, et al. DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method. Brief Bioinform. 2020;22(3):bbaa205.

    Article  Google Scholar 

  47. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–82.

    Article  CAS  PubMed  Google Scholar 

  48. Bateman A, Martin MJ, Orchard S, Magrane M, Ahmad S, Alpi E, et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2022;51(D1).

  49. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Li M, Zhou J, Hu J, Fan W, Zhang Y, Gu Y, et al. Dgl-lifesci: An open-source toolkit for deep learning on graphs in life science. ACS Omega. 2021;6(41):27233–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Gao Z, Jiang C, Zhang J, Jiang X, Li L, Zhao P, et al. Hierarchical graph learning for protein-protein interaction. Nat Commun. 2023;14(1):1093.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Niwattanakul S, Singthongchai J, Naenudorn E, Wanapu S. Using of Jaccard coefficient for keywords similarity. In: Proc Int Multiconf Eng Comput Sci. vol. 1. 2013. pp. 380–4.

  54. Yu Y. CCL-ASPS. Zenodo. 2024. https://doi.org/10.5281/zenodo.13329691.

Download references

Acknowledgements

Not applicable.

Funding

This work has been supported by the National Natural Science Foundation of China (Grant No. 62371423, 61802432), Key Scientific and Technological Project of Henan Province (Grant No. 232102211027).

Author information

Authors and Affiliations

Authors

Contributions

Z.T. and Y.Y. conceived and designed the experiment. Y.Y. performed the experiment. Z.T. and Y.Y. wrote and revised the manuscript. Z.T., Q.Z. and F.N. supervised the whole process. All authors provided feedback on the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fengming Ni.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors read and approved the final manuscript.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, Z., Yu, Y., Ni, F. et al. Drug-target interaction prediction with collaborative contrastive learning and adaptive self-paced sampling strategy. BMC Biol 22, 216 (2024). https://doi.org/10.1186/s12915-024-02012-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12915-024-02012-x

Keywords