Skip to main content

CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model

Abstract

Background

Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance.

Results

Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods.

Conclusions

CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.

Background

Compared with traditional RNA, CircRNA, a category of non-coding RNA produced from pre-mRNA [1, 2], features a unique circular closed structure [3]. As we know, structure determines function. Due to its special structure, mounting evidence [4,5,6,7] shows that CircRNA is more stable than other RNAs and can bind with proteins, thereby regulating gene expression, cell proliferation, and apoptosis. Consequently, it plays a crucial role in diseases. For example, CircRNA circ-Foxo3 has been found to interact with cell cycle-related proteins such as CDK2 and p21, regulating the proliferation and apoptosis of cardiomyocytes. Studies indicate that circ-Foxo3 is up-regulated in aged cardiomyocytes and inhibits the cell cycle process by interacting with CDK2 and p21, thus affecting the function of cardiomyocytes [8]. CircRNAs also play a significant role in cancer for instance, circNSUN2 promotes its cytoplasmic export via m6A modification and interacts with High Mobility Group A2 (HMGA2), stabilizing HMGA2 and thereby promoting liver metastasis of colorectal cancer. This reveals the crucial role of circRNAs in cancer metastasis [9]. These studies convincingly demonstrate the significant research value of circRNAs, especially in analyzing their interactions with RBPs, which is conducive to revealing the functional mechanism of CircRNAs in physiological and pathological processes, offering new insights for the diagnosis and treatment of diseases [10,11,12], and has potential clinical application value.

With the continuous advancement of sequencing technologies, the new generation sequencing techniques [13] have facilitated the generation of more databases containing circRNA annotations and interactions with proteins, such as CircBase [14] and CircInteractome [15]. However, traditional experimental methods require substantial manpower, material resources, and financial investment to identify even a few RBP binding sites, making it difficult to meet the data processing demands of the big data era. The emergence of a predictive model has effectively solved this problem and achieved promising results. For example, CRIP [16] uses CNN and RNN to capture abstract features and sequence dependencies within stacked codons, thereby accomplishing the task of predicting RBP binding sites. CircSLNN [17] employs word embedding to encode sequences and utilizes CNN and BiLSTM to sequentially extract local features and context dependencies. Finally, it uses Conditional Random Fields (CRF) [18] to transform the binding site prediction task into a sequence labeling problem. CSCRSites [19] encodes sequences by the normalized frequency of k-mer occurrences, extracts features from the sequences using multiple CNNs, and classifies RBP binding sites related to cancer. The above algorithms are single-view sequence-based methods, which suffer from the sparsity of the acquired one-dimensional sequence information and cannot provide comprehensive insights. Consequently, researchers have explored multi-view algorithms. PASSION [20] employs six types of feature encoding schemes, optimizes the features through incremental feature selection and the XGBoost algorithm, and performs classification using ANN and hybrid deep neural networks. iCircRBP-DHN [21] uses statistical frequency KNFP and word embedding CircRNA2Vec to encode sequences and employs a deep multi-scale residual network, BiGRU, and self-attention mechanism for feature extraction and prediction. CRBPDL [22] utilizes five different feature encoding schemes and employs Adaboost to integrate MSRN, BiGRU, and other deep networks for feature extraction to enhance the model's predictive performance and reliability. HCRNET [23] integrates multi-source biological information, including KNFP, CircRNA2Vec, and a fine-tuned DNABERT model for encoding, and uses deep TCN to extract latent nucleotide dependencies, achieving excellent identification performance. CircSSNN [24] encodes sequence representations and statistical distributions, local and global features using KNFP, CircRNA2Vec, and a fine-tuned DNABERT encoding scheme, and extracts deep features using sequence self-attention to achieve better performance. JLCRB [23] utilizes multiple feature descriptors and constructs a joint representation network composed of BiLSTM and CNN to enhance the correlation and consistency among various features, resulting in improved predictive performance. CircSI-SSL [25] employs three types of feature descriptors, namely KNFP, CircRNA2Vec, and EIIP [26], and proposes a cross-view self-supervised task to pre-train the network, thereby reducing the need for labeled data. Subsequently, it requires only a small number of labels to identify the corresponding RBPs. Despite this, the predictive performance of these methods still needs improvement.

In recent years, with the outstanding performance of Alphafold [27] in the field of protein structure prediction, there has been a surge of interest in the structures of biomolecules. Researchers have increasingly incorporated structural features of sequences into related predictive tasks. SSCRB [28] integrates both sequence descriptors and structural information of circRNA and uses an ensemble of multiple submodels, which combine multi-scale features through attention mechanisms, for the task of predicting circRNA-RBP interaction sites. DeepCRBP [29] captures the structural features of circRNA molecular graphs using graph neural networks and processes local and global sequence features with BiGRU, integrating both for prediction. It is undeniable that the prediction algorithm incorporating structural features has achieved a great improvement in the prediction performance. However, these methods treat structural features merely as a supplement to the multi-view feature information, combining multiple sequence 1D descriptive features with structural information to form multi-view features, without fully extracting the structural features of sequences. Behind the seemingly reasonable, there is a huge hidden danger. There are notable differences between sequence description features and structural features, and their fusion increases the burden of algorithm training. Moreover, the structural characteristics already contain a wealth of information reflecting the function of circRNA, especially their functional interactions relationship with proteins (RBPs). Therefore, introducing sequence feature description into the predictive model only exacerbates the sparsity and ineffective representation of features.

To address the aforementioned issues and through extensive research, we propose a novel prediction model for circRNA-RBP interaction sites using structural attention, named CRBPSA. Firstly, we generate an RNA base-pairing matrix by calculating the matching probability between each base within the circRNA, utilizing CDPfold [30] to construct the RNA secondary structure. A Gaussian function is introduced as a weight to account for the pairing of adjacent bases, ultimately producing the RNA base-pairing matrix M. To fully capture the structural features of sequences for more accurate binding site prediction, we develop a novel network architecture based on structural attention, named Structure_Transformer, which extracts base-pairing information and spatial positional dependencies to obtain deeper features. Additionally, we integrate ResNet and LayerNorm modules into the deep network to enhance robustness and reduce sensitivity to hyperparameters. Consequently, the CRBPSA algorithm achieves an average AUC of 99.93% across 37 recognized benchmark datasets using the same set of hyperparameters, demonstrating superior performance compared to existing models. Furthermore, ablation and stability analyses indicate that CRBPSA not only achieves higher predictive accuracy but also exhibits superior stability compared to other state-of-the-art methods. The model architecture and related experimental outcomes are illustrated in Fig. 1.

Fig. 1
figure 1

A CRBPSA model overall architecture diagram. B Boxplot of performance comparison with existing prediction algorithms. C Bar chart for performance evaluation of different feature descriptors. D t-SNE clustering visualization for the processed features of the circRNA-WTAP dataset. E Visualization of the stability evaluation of the top 3 prediction algorithms on the circRNA-WTAP dataset

Results and discussion

The performance of the proposed algorithm

The proposed CRBPSA model was tested on 37 recognized circRNA-RBP benchmark datasets. The algorithm’s performance was evaluated using four indicators commonly used in previous studies, namely AUC, ACC, precision, and recall. The results for each dataset are listed in detail in Table 1.

Table 1 Performance of the CRBPSA algorithm on 37 circRNA-RBP datasets

It can be clearly seen that the CRBPSA algorithm achieved outstanding results across all datasets, whether on large datasets like AGO2 with 40,000 samples (see the Datasets pre-processing section for details) or on smaller datasets like WTAP with only 892 samples. The overall AUC consistently reaches 99.9%, and the other metrics also exceed 90%, which is unprecedented. According to the latest literature reports in 2024, the average AUC achieved by the algorithm on these 37 datasets is only 97.7% (see the next section for details). Achieving even a 1% increase in average AUC represents an overall improvement in algorithm performance across all sub-datasets, which is not a small feat. We also observed that the algorithm’s performance is more stable and the metrics are better on large-sample datasets. In contrast, there may be some deviation on small-sample datasets, such as WTAP, where ACC is 92.97%, AUC is 99.32%, precision is 92.39%, and recall is 93.75%. That is because deep learning-based frameworks are generally more sensitive to data volume, which also occurs in the same type of prediction algorithms. Even so, our algorithm still achieves impressive outcomes on small samples and outperforms on small sample learning compared to other predictive algorithms (see the next section for details). This also reflects the advantages of our algorithm in quickly capturing the structural features of sequences and the efficient and stable network architecture. In small sample scenarios, the structure-based attention model can quickly capture the representative features, thus reducing the number of trial-and-error samples needed on the dataset, resulting in excellent results.

Performance comparison with existing prediction algorithms

In order to reflect the superiority of CRBPSA algorithm over other algorithms on the test data, we select the comprehensive performance AUC of the five most advanced algorithms on various datasets for comparison, as shown in Fig. 2. Clearly, CRBPSA outperforms these latest algorithms on every dataset, achieving the highest average AUC value of 0.9993 and showing near-perfect stability across all datasets. Compared to SSCRB, DeepCRBP, JLCRB, HCRNet, and CRBPDL algorithms, CRBPSA exceeds their performance by 2.23%, 3.93%, 6.25%, 6.73%, and 8.05%, respectively. We also observed that while the SSCRB algorithm achieves decent results, it did not perform as well on the TAF15, TNRC6, C17ORF85, FXR1, ALKBH5, and FOX2 datasets compared to its performance on other datasets, and even fell below 90% on WTAP. Of course, part of the reason is the small amount of data (ranging from 892 to 2934 samples), but the fundamental issue is that the algorithm lacks the ability to capture useful features effectively, necessitating a substantial number of samples for trial and error. In contrast, our algorithm performs optimally across datasets with vastly different sample sizes (ranging from 892 to 40000), demonstrating its strong capability in extracting representative features of circRNA. We also note that algorithms incorporating circRNA 2D structural features (CRBPSA, SSCRB, DeepCRBP) generally outperform those that do not (JLCRB, HCRNet, CRBPDL). This is because RNA structure reflects its spatial distribution and base interaction relationships, which are crucial for revealing its function and interactions with other molecules. Even though JLCRB, HCRNet, and CRBPDL use language models trained on extensive DNA and RNA sequences, such as CircRNA2Vec and DNA_Bert, which can effectively model common features of RNA sequences at the macro level. However, they lack the capability to capture RNA structure in specific contexts (see the next section for comparison of relevant experiments).

Fig. 2
figure 2

Performance comparison between CRBPSA and existing prediction algorithms

For algorithms that use structural information, SSCRB and DeepCRBP combine structural information, but only as a supplement to multi-view information, and do not adequately capture their inherent rich guidance information. Moreover, the sparsity and ineffective representation caused by the introduction of a large number of sequence description information overwhelms the effective features and increases the burden of the algorithm to mine useful features. Specifically, SSCRB uses KNFP encoding and codon stacking to interpret the sequence quantification information of trinucleotide components. However, KNFP is a sequence characterization method that counts various nucleotide arrangement patterns, calculates the corresponding occurrence frequencies, and ultimately represents them using one-hot encoding. The one-hot representation of 21 functional codons (corresponding to 20 amino acid + stop codons) provides a diverse digital representation of the sequence from different perspectives, but also introduces a large number of zero values, resulting in the difficulty in extracting representative features of the fusion data. Moreover, this algorithm primarily applies an attention mechanism to sequence information, which is then weighted with structural feature information to make classification decisions. This exposes the algorithm’s insufficiency in capturing structural feature information and fails to make use of the rich information contained in the structure to support the algorithm’s decision-making. On the one hand, DeepCRBP takes the base as the vertex of the molecular graph, defines the base pairing probability greater than 0.5 as the edge of the corresponding vertex, and uses GCN to capture the spatial structure and adjacency of the graph. On the other hand, BiGRU is used to process fused sequence description features (e.g., KNFP) to extract context information, and finally combine the above two to predict RBP binding sites. First of all, the algorithm uses KNFP, which also has the above problems that make the data sparse, overwhelm useful features, and increase the difficulty of the algorithm to extract meaningful features from the fusion data. Secondly, if the pairing probability between two bases is artificially defined as greater than 0.5, they are considered to have a connecting edge. This one-size-fits-all policy will lose many potential base pairs with interaction relations. And the pairing probability of most of the bases is lower than the requirement, so that there is very little information for the effective use of graph neural networks. If the threshold is lowered (< 0.5), there are two bases that are not highly correlated originally and are used as correlation evidence to affect the training of the graph network. So DeepCRBP does not achieve optimal performance.

Therefore, the advantages of the circRNA structure attention architecture CRBPSA lies in its use of Structure_Transformer combined with an underlying convolution network to adaptively capture the deep dependency relationships of structural information, and use residual structure and LayerNorm module to ease the problem of gradient disappearance and gradient explosion. It makes the training of the deep network more stable and fully utilizes the structure information auxiliary algorithm for decision-making.

Performance evaluation of different feature descriptors

In this section, we compare the effectiveness of sequence feature descriptors KNFP, CircRNA2Vec and DNABert, which have been extensively used by previous researchers, with the structural feature extraction method used in this paper, as visualized in Fig. 3. From the results of using the same feature extraction network to capture features and predict from different single views, it can be seen that, compared with the structural features, the description features of sequence 1D are generally insufficient in describing sequences, and the algorithm is difficult to obtain more discriminative features, which also confirms the previous statements in this paper. We also note that the algorithm using DNABERT has achieved fairish prediction results, but this success comes at the cost of substantial computational power and large amounts of corpus data for training, which sharply contrasts with the lightweight model proposed in this paper. DNABERT [31] is a model trained by BERT on a large number of unlabeled DNA sequences for syntactic understanding and dependency capture of nucleotide sequences. Since circRNA and DNA sequences have similar semantic and syntactic structures, they can be transferred to the coding task of circRNA, thereby generating dynamic and rich global contextual information. Although DNABERT can achieve a macroscopic understanding from the perspective of semantics and syntax, it fails to describe the base interaction relationships within specific structure, which negatively impacts the performance of the algorithm. It may be surprising that the circRNA word vector model, CircRNA2Vec, trained by the Doc2Vec algorithm, which should have achieved better performance than KNFP, a shallow feature descriptor with only one-hot encoded nucleotide tuple frequencies, actually lags by 8% in average AUC. The reason for our analysis may be that, first of all, the feature dimension of the circRNA sequence encoded by CircRNA2Vec is only 30, and the capacity to store information is limited, making it difficult to fully express the global and local dependencies of the sequence. Additionally, the Structure_Transformer feature extraction architecture proposed in this paper can overcome the sparsity of KNFP data and capture its inherent dependence and location relationship with a stable and efficient network, thus enabling CRBP-KNFP to achieve superior performance. We also found that even though Structure_Transformer is designed for capturing sequence 2D structural features, it can derive deep instructive features from sequence 1D description information.

Fig. 3
figure 3

Comparison of CRBPSA performance with different feature descriptors

Algorithm stability analysis

In this section, in order to evaluate the stability of the algorithms in handling the dataset under random conditions, we test the top three performing models: CRBPSA (our model), SSCRB, and DeepCRBP. We did this by removing the original random seed specified in the code and running the experiments 10 times on the WTAP dataset, which has the fewest samples. The AUC comprehensive performance results obtained from 10 consecutive runs were recorded and visualized in a line chart, as shown in Fig. 1E. It can be seen that our algorithm consistently yields nearly identical results across the 10 runs on this small dataset of only 892 samples, with minimal fluctuations, reflecting the high efficiency and stability of our algorithm's architecture. The second is the DeepCRBP algorithm, which has certain fluctuations in performance, while the SSCRB exhibits the poorest stability on this dataset. The reason may lie in the fact that the two algorithms are not capable of extracting representative features as mentioned before. The sparse and nonrepresentational data in 1-dimensional feature descriptors affects the feature capture of the algorithm. Therefore, it shows poor stability on random, small-sample datasets. Additionally, the small sample size is too small, which is not conducive to the training of deep learning and the corresponding feature learning.

Motif analysis and model interpretation

In this section, we extract a binding sequence from 37 circRNA-RBP binding datasets and use the MEME suite [32] for motif analysis. To be specific, we select the classic mode, set the minimum width of the motif to 6 and the maximum to 50, extract motifs from any site distribution within a single sequence, and obtain multiple motifs along with their location information. In the legend, each motif is given in order of position P-value (indicating the probability of observing the motif in a random background) from smallest to largest. The smaller the P-value is, the more significant the motif is. Taking a motif analysis of the binding fragment of circRNA and WTAP protein as an example, as shown in Fig. 4A, it can be observed that the combined P-value of this motif is 3.31e − 5, which is far less than 0.05, indicating that the motif found in this sequence has very high significance. This figure also shows the position of different motifs on the sequence, different colors represent different symbols (as shown in the legend), and the height of the color bar is proportional to the negative logarithm of the P-value of the site, reflecting the importance of the site.

Fig. 4
figure 4

A motif analysis for the circRNA-WTAP dataset. B, C represents the visualization of the CRBPSA model before and after circRNA-WTAP processing

At the same time, we also apply this sequence to the model trained in this paper to calculate the gradient of the neural network output relative to the input, so as to show which input regions have the greatest influence on the final result. We use the method of Saliency Maps [33] for calculation. Simply assume that the network of the algorithm in this paper is \(f(x)\), the pairing matrix \(x\) of the base pair is input, and the network output is \({f}_{c}(x)\) for a certain category c. Saliency Map S(x) can be calculated by \(S(\mathbf{x})=\left|\frac{\partial {f}_{c}(\mathbf{x})}{\partial \mathbf{x}}\right|\).

The initial input of the model is visualized in Fig. 4B, and the Saliency Map of the model relative to the initial input is visualized in Fig. 4C. We observe that through the feature extraction by the 1D convolution neural network, BatchNorm, and the attention mechanism for capturing sequence representation information, as well as the final projection, our model effectively captures the linear dependence between adjacent bases of the sequence (from top to bottom). We also find that the presented banded distribution roughly matches the distribution of motif loci across the sequence, and the sparse distribution is based on significance differences (from left to right). This may be one of the reasons for the excellent performance of our algorithm: it identifies conserved sequence fragments with specific functions and adaptively computes weighted values according to the different protein binding scenarios in which they are located. The same visual analysis is performed on the remaining 36 datasets, which are combined together, as shown in Figs. 5 and 6. In addition, both positive and negative samples of the WTAP dataset are used to obtain the extracted features through the trained model, and then visualized by t-SNE (Fig. 1D). It can be seen that the distribution of positive and negative samples after processing is obviously distinguishable, which also shows that the algorithm architecture in this paper can learn powerful discriminative features from positive and negative samples for correct classification. It has strong robustness and can avoid interference from a few noise samples.

Fig. 5
figure 5

Motif analysis of the remaining 36 circRNA-RBP datasets. A, B motif location distribution and corresponding symbol obtained from motif analysis on 36 datasets

Fig. 6
figure 6

Model visualization of Saliency Map after processing 36 datasets

Conclusions

In this work, founded on the structural attention mechanism, we propose a circRNA-RBP binding site recognition algorithm CRBPSA. Unlike previous feature extraction algorithms which used sequence 1D description features to construct multi-view representations and minority incorporate structural features as a supplementary, our algorithm fully captures the interactions and dependencies between the bases inherent in structural features. With the lightweight and efficient network architecture, it can extract representative structural features of sequences with minimal computational power and achieves a historical best average AUC of 99.93% across 37 public datasets. Through multi-angle comparative evaluation, motif analysis, and model interpretation, the reasons for the excellent performance of the algorithm are further revealed. It is expected to deeply understand the interaction between circRNA and protein and reveal the complex functional regulation mechanism in organisms.

Methods

Datasets pre-processing

To validate the effectiveness of the proposed algorithm and facilitate comparison with existing models, we select 37 circRNA-RBP standard datasets widely used by previous algorithms in this field. The dataset is derived from the official CircInteractome database (https://circinteractome.nia.nih.gov/) of protein-binding sequences. In accordance with previous work [16], we use CD-HIT (with a threshold of 0.8) [34] to eliminate similar sequences. For each RBP, the binding sites of circRNA and corresponding proteins were sorted out from CLIP-seq data, and sequence fragments with a total of 101 bp extracted by extending 50 nt to each end were taken as positive samples, and other equal-length sequences were randomly selected as negative samples. The total number of samples across the 37 datasets reached 671,952, with detailed information provided in Table 2.

Table 2 Details of the 37 datasets

CRBPSA algorithm architecture

In this section, in order to introduce the specific work of our proposed CRBPSA model in detail, we divide the model into several specific tasks to describe: structural features extraction, attention model based on structural information, feature projection, and binding sites prediction. A simplified diagram of the CRBPSA network structure is visualized in Fig. 1A.

Sequence structure feature extraction

We know that the secondary structure of circRNA is mainly composed of various structures formed by whether the bases in the sequence are paired, such as stem-loops, hairpins, etc. These structures play a crucial role in circRNA formation, protein binding, and other physiological activities. Additionally, the number of hydrogen bonds in base pairing also affects the stability of RNA. Therefore, if the base pairing within the sequence is determined, the overall structure of circRNA can be roughly determined, and its physiological function can be indirectly reflected. Inspired by this, we use the same secondary structure representation tool, CDPfold [30], that was used in SSCRB [28]. Following previous research, we apply the following settings. The number of hydrogen bonds from different base pairings is used as a weight: A-U or U-A is weighted at 2, G-C or C-G at 3, and potential G-U or U-G pairings are set to 0.8, according to previous studies. Considering that the stem formed by base pairings is relatively stable in the middle and unstable on both sides, CDPfold introduces the concept of locally weighted linear regression and uses Gaussian function as the weight to obtain the final circRNA base-pairing matrix M, so as to fully reflect the base-pairing ability at different positions.

In order to fully explore the intrinsic feature information contained in the base-pairing matrix M, we use a 1D convolutional neural network to extract local patterns and features from the data. Specifically, we regard the matrix M (101 × 101) as the sequence of interactions of a single base relative to the whole (1 × 101), set the kernel to 1 × 1, and the output channel to 128, performing a linear transformation on the features at each time step. We also apply Batch normalization to accelerate training and improve the model's stability and performance. The mathematical formulation is expressed as follows.

$${\mathbf{X}}_{f} = {\text{ReLU}} \left( {{\text{BN}} \left( {{\text{Conv}} 1d\left( {\mathbf{\rm M}} \right)} \right)} \right)$$
(1)

Structure_Transformer network

The Attention Mechanism is a way to selectively focus on different parts of an input sequence by dynamically assigning attention weights. Due to its ability to selectively attend to parts of the input sequences, capture long-distance dependencies, and its distinction from traditional sequence models like Recurrent Neural networks (RNNS) [35], Gated Recurrent units (GRUs), Long Short-Term Memory network (LSTM) [36] and other sequential models rely on the results of the previous time step, which is difficult to perform parallel computation. It is widely used in models for natural language processing. Notable examples include Transformer [37] and BERT [38], both of which utilize the attention mechanism. In this research on circRNA-RBP binding site prediction, in order to meet the requirement for extracting structural information, we design an improved Transformer architecture based on structural information feature extraction [24, 39]. The model named Structure_Transformer utilizes the attention mechanism. We discard the bidirectional GRU used by previous methods such as DeepCRBP and iCircRBP-DHN, and the deep temporal convolutional network used by HCRNet to obtain the time dependency. Use only the attention mechanism implemented by the fully connected layer to make the computation of the model as flexible and efficient as possible with fewer resources required. Different from the general attention model used by SSCRB, iCircRBP-DHN, etc., firstly, we introduce residual connection and Pre-Layernorm to alleviate the gradient disappearance, explosion, and network degradation problems that are prone to occur in deep networks, and the model performance is relatively insensitive to the selection of hyperparameters. Impressive results can be obtained without the need of specific parameters. Secondly, instead of using the attention mechanism as the fusion of multi-view information obtained from processing multiple feature descriptors, we use the attention model to directly capture circRNA structure information, which enables the model to focus on capturing the representative features of circRNA structure at specific RBP sites, thereby avoiding the computational burden associated with reconciling multiple types of sparse features. The formula is expressed as follows:

$$Q = {\text{Linear}}_{q} \left( {{\text{LN}} \left( {{\mathbf{X}}_{e} } \right)} \right)$$
(2)
$$K = {\text{Linear}}_{k} \left( {{\text{LN}} \left( {{\mathbf{X}}_{e} } \right)} \right)$$
(3)
$$V = {\text{Linear}}_{v} \left( {{\text{LN}} \left( {{\mathbf{X}}_{e} } \right)} \right)$$
(4)
$${\mathbf{X}}_{a} = {\text{Softmax}} \left( {\frac{{QK^{T} }}{\sqrt d }} \right)V$$
(5)
$${\mathbf{X}}_{b} = {\text{Droupout}} \left( {{\text{Linear}} \left( {{\mathbf{X}}_{a} } \right)} \right) + {\mathbf{X}}_{e}$$
(6)

where \({X}_{e}\) represents the embedded information transmitted to the attention mechanism, and Q, K, and V, respectively, represent the content needed for dynamic attention allocation, the characteristics of reference information available for selection, and specific information. By calculating the similarity between Q and K, the value is weighted, and the attention output \({X}_{a}\) is obtained. The residual structure \({X}_{b}\) is formed by combining the linear transformation and droupout result of \({X}_{a}\) with the original \({X}_{e}\). For further feature extraction and nonlinear transformation to enhance the expressibility of the model, the residual connection and Pre-Layernorm structure are also introduced after the output of the attention mechanism, followed by a feedforward neural network composed of multilayer perceptron.

Feature projection and prediction

After feature capture of Structure_Transformer based on the attention mechanism, we can obtain deep-level high-dimensional representative features. But at this time, there may be some redundant information in the high-dimensional features, and the calculation of high-dimensional data requires a lot of computing resources and storage of intermediate data. In order to eliminate redundancy and reduce complexity, we design a projection head module, which consists of a 1D convolutional neural network, dropout layer, linear layer, and ReLU layer. Specifically, we use the 1D convolutional neural network for channel fusion to merge data with input channel 128 into output with channel 1, and the kernel size is 1 × 1. The output is then passed through ReLU activation and Dropout layer (with a parameter of 0.35) to randomly discard some neural network units, preventing overfitting caused by complex networks and the robustness of the network. Subsequently, multiple Linear layers combined with the ReLU activation function are used to gradually reduce the data's dimensionality. By learning weights to further extract useful features and remove redundancy, the generalization ability of the model is improved, and the nonlinear representation ability is increased. Finally, the network is trained by LogSoftmax function and negative log-likelihood loss NLLLoss. Unlike standard Softmax, which calculates the exponential function exp(x) first and then takes the log (x) in the cross-entropy loss, potentially causing numerical instability (especially when the input value is very large or small). In contrast, LogSoftmax combines these two steps to improve computational efficiency and reduce numerical instability.

$$Softmax\left( {z_{i} } \right) = \frac{{e^{{z_{i} }} }}{{\sum\nolimits_{j = 1}^{n} {e^{{z_{j} }} } }}$$
(7)
$${\text{LogSoftmax}} \left( {z_{i} } \right) = \log \left( {{\text{Softmax}} \left( {z_{i} } \right)} \right) = z_{i} - \log \left( {\sum\nolimits_{j = 1}^{n} {e^{{z_{j} }} } } \right)$$
(8)
$${\text{NLLLoss}} = - \frac{1}{n}\sum\limits_{i = 1}^{n} {{\text{y}}_{i} {\text{LogSoftmax}} \left( {z_{i} } \right)}$$
(9)

Experimental setting and evaluation index

In this experiment, according to the convention, 80% of the samples are randomly selected as the training set to participate in gradient backpropagation and parameter optimization of the network, while the remaining 20% of the samples are used to test network performance. In order to reflect the effectiveness of the model architecture rather than improvements brought by hyperparameter adjustment, we do not set up a validation set, nor do we adjust hyperparameters specifically for the algorithm. We use a single set of hyperparameters for training and testing the CRBPSA across 37 datasets on an Ubuntu-20.04 system with configurations including CUDA-11.6, PyTorch-1.13, Python-3.7.4, etc. The Adam optimizer is used to train the parameters of the deep learning model, with a learning rate set at 3e − 3, a first-order momentum decay rate (beta1) of 0.9, a second-order momentum decay rate (beta2) of 0.99, and weight_decay set to 3e-4. We evaluate the overall performance, accuracy rate, and recall rate of the model using commonly used algorithm performance metrics from previous research, namely AUC, ACC, precision, and recall. AUC is an important metric for evaluating binary classification models, calculated by measuring the area under the ROC curve, which is plotted by FPR and TPR. The formulas are expressed as follows, where TP and FN respectively represent the number of positive samples correctly and incorrectly classified, and FP and TN respectively represent the number of negative samples incorrectly and correctly classified.

$$FPR = \frac{FP}{{TN + FP}}$$
(10)
$$TPR = \frac{TP}{{TP + FN}}$$
(11)
$$ACC = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(12)
$${\text{Precision}} = \frac{TP}{{TP + FP}}$$
(13)
$${\text{Recall}} = \frac{TP}{{TP + FN}}$$
(14)

Data availability

 The datasets and codes are available at https://github.com/cc646201081/CRBPSA and https://doi.org/10.5281/zenodo.13943172.

Abbreviations

circRNA:

Circular RNA

RBPs:

RNA-binding proteins

ANN:

Artificial neural network

CNN:

Convolutional neural network

RNN:

Recurrent neural network

KNFP:

K-tuple nucleotide frequency pattern

BiLSTM:

Bidirectional long short-term memory

BiGRU:

Bidirectional gated recurrent unit

DTCN:

Deep temporal convolutional network

TPR:

True positive rate

FPR:

False positive rate

MSRN:

Multi-scale residual network

ReLU:

Rectified linear unit

GCN:

Graph convolutional network

LayerNorm:

Layer normalization

t-SNE:

T-Distributed stochastic neighbor embedding

References

  1. Qu S, Yang X, Li X, Wang J, Gao Y, Shang R, Sun W, Dou K, Li H. Circular RNA: a new star of noncoding RNAs. Cancer Lett. 2015;365(2):141–8.

    Article  CAS  PubMed  Google Scholar 

  2. Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, Evantal N, Memczak S, Rajewsky N, Kadener S. circRNA biogenesis competes with pre-mRNA splicing. Mol Cell. 2014;56(1):55–66.

    Article  CAS  PubMed  Google Scholar 

  3. Nielsen AF, Bindereif A, Bozzoni I, Hanan M, Hansen TB, Irimia M, Kadener S, Kristensen LS, Legnini I, Morlando M. Best practice standards for circular RNA research. Nat Methods. 2022;19(10):1208–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Dong Y, Gao Q, Chen Y, Zhang Z, Du Y, Liu Y, Zhang G, Li S, Wang G, Chen X. Identification of CircRNA signature associated with tumor immune infiltration to predict therapeutic efficacy of immunotherapy. Nat Commun. 2023;14(1):2540.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Vo JN, Cieslik M, Zhang Y, Shukla S, Xiao L, Zhang Y, Wu Y-M, Dhanasekaran SM, Engelke CG, Cao X. The landscape of circular RNA in cancer. Cell. 2019;176(4):869-881. e813.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495(7441):333–8.

    Article  CAS  PubMed  Google Scholar 

  7. Chen LL. The biogenesis and emerging roles of circular RNAs. Nat Rev Mol Cell Biol. 2016;17(4):205–11.

    Article  CAS  PubMed  Google Scholar 

  8. Du WW, Yang W, Liu E, Yang Z, Dhaliwal P, Yang BB. Foxo3 circular RNA retards cell cycle progression via forming ternary complexes with p21 and CDK2. Nucleic Acids Res. 2016;44(6):2846–58.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Chen RX, Chen X, Xia LP, Zhang JX, Pan ZZ, Ma XD, Han K, Chen JW, Judde JG, Deas O. N 6-methyladenosine modification of circNSUN2 facilitates cytoplasmic export and stabilizes HMGA2 to promote colorectal liver metastasis. Nat Commun. 2019;10(1):4695.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol. 2024;22(1):24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks. Bioinformatics. 2022;38(8):2246–53.

    Article  CAS  PubMed  Google Scholar 

  12. Chen Y, Wang J, Wang C, Zou Q. AutoEdge-CCP: a novel approach for predicting cancer-associated circRNAs and drugs based on automated edge embedding. PLoS Comput Biol. 2024;20(1): e1011851.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26(10):1135–45.

    Article  CAS  PubMed  Google Scholar 

  14. Glažar P, Papavasileiou P, Rajewsky N. circBase: a database for circular RNAs. RNA. 2014;20(11):1666–70.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Dudekula DB, Panda AC, Grammatikakis I, De S, Abdelmohsen K, Gorospe M. CircInteractome: a web tool for exploring circular RNAs and their interacting proteins and microRNAs. RNA Biol. 2016;13(1):34–42.

    Article  PubMed  Google Scholar 

  16. Zhang K, Pan X, Yang Y, Shen HB. CRIP: predicting circRNA–RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA. 2019;25(12):1604–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ju Y, Yuan L, Yang Y, Zhao H. CircSLNN: identifying RBP-binding sites on circRNAs via sequence labeling neural networks. Front Genet. 2019;10:1184.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sutton C, McCallum A. An introduction to conditional random fields. Found Trends Mach Learn. 2012;4(4):267–373.

    Article  Google Scholar 

  19. Wang Z, Lei X, Wu FX. Identifying cancer-specific circRNA–RBP binding sites based on deep learning. Molecules. 2019;24(22):4035.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Jia C, Bi Y, Chen J, Leier A, Li F, Song J. PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs. Bioinformatics. 2020;36(15):4276–82.

    Article  CAS  PubMed  Google Scholar 

  21. Yang Y, Hou Z, Ma Z, Li X, Wong KC. iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinform. 2021;22(4):bbaa274.

    Article  PubMed  Google Scholar 

  22. Niu M, Zou Q, Lin C. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS Comput Biol. 2022;18(1): e1009798.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Yang Y, Hou Z, Wang Y, Ma H, Sun P, Ma Z, Wong KC, Li X. HCRNet: high-throughput circRNA-binding event identification from CLIP-seq data using deep temporal convolutional network. Brief Bioinform. 2022;23(2):bbac027.

    Article  PubMed  Google Scholar 

  24. Cao C, Yang S, Li M, Li C. CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization. BMC Bioinformatics. 2023;24(1):220.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics. 2024;40(1):btae004.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lalović D, Veljković V. The global average DNA base composition of coding regions may be determined by the electron-ion interaction potential. Biosystems. 1990;23(4):311–6.

    Article  PubMed  Google Scholar 

  27. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Liu L, Wei Y, Zhang Q, Zhao Q. SSCRB: Predicting circRNA-RBP interaction sites using a sequence and structural feature-based attention model. IEEE J Biomed Health Inform. 2024;28:1762–72.

    Article  PubMed  Google Scholar 

  29. Xu Z, Song L, Liu S, Zhang W. Deepcrbp: improved predicting function of circrna-rbp binding sites with deep feature learning. Front Comp Sci. 2024;18(2): 182907.

    Article  Google Scholar 

  30. Zhang H, Zhang C, Li Z, Li C, Wei X, Zhang B, Liu Y. A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front Genet. 2019;10:467.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Ji Y, Zhou Z, Liu H, Davuluri RV. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics. 2021;37(15):2112–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Res. 2015;43(W1):W39–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. 2014.

  34. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.

    Article  CAS  PubMed  Google Scholar 

  35. Medsker LR, Jain L. Recurrent neural networks. Des Appl. 2001;5(64–67):2.

    Google Scholar 

  36. Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019;31(7):1235–70.

    Article  PubMed  Google Scholar 

  37. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. 2017. p. 5998–6008.

  38. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2019. p. 4171–4186.

  39. Eldele E, Ragab M, Chen Z, Wu M, Kwoh CK, Li X, Guan C. Time-series representation learning via temporal and contextual contrasting. 2021. p. 2352–2359.

Download references

Acknowledgements

We would like to thank the three anonymous reviewers and the relevant journal staff, whose constructive feedback has been very helpful in enhancing the presentation of this paper.

Funding

The work was supported by the National Natural Science Foundation of China (No.62231013 and 62131004), the National Key R&D Program of China (2022ZD0117700), Zhejiang Provincial Natural Science Foundation of China (No. LD24F020004), Special Support Plan for High level Talents in Zhejiang Province (2021R52019), and the Municipal Government of Quzhou (No.2023D036).

Author information

Authors and Affiliations

Authors

Contributions

C.C. conceived and designed the experiment. C.C. and C.W. performed the experiment. Q.D. and C.C. analyzed the results. Q.Z. and T.W. revised the manuscript. T.W. and Q.Z. provided funding and resources and project administration. All authors provided feedback on the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tao Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, C., Wang, C., Dai, Q. et al. CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model. BMC Biol 22, 260 (2024). https://doi.org/10.1186/s12915-024-02055-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12915-024-02055-0

Keywords