Characterization of a novel type of carbonic anhydrase that acts without metal cofactors

Background Carbonic anhydrases (CAs) are universal metalloenzymes that catalyze the reversible conversion of carbon dioxide (CO2) and bicarbonate (HCO3-). They are involved in various biological processes, including pH control, respiration, and photosynthesis. To date, eight evolutionarily unrelated classes of CA families (α, β, γ, δ, ζ, η, θ, and ι) have been identified. All are characterized by an active site accommodating the binding of a metal cofactor, which is assumed to play a central role in catalysis. This feature is thought to be the result of convergent evolution. Results Here, we report that a previously uncharacterized protein group, named “COG4337,” constitutes metal-independent CAs from the newly discovered ι-class. Genes coding for COG4337 proteins are found in various bacteria and photosynthetic eukaryotic algae. Biochemical assays demonstrated that recombinant COG4337 proteins from a cyanobacterium (Anabaena sp. PCC7120) and a chlorarachniophyte alga (Bigelowiella natans) accelerated CO2 hydration. Unexpectedly, these proteins exhibited their activity under metal-free conditions. Based on X-ray crystallography and point mutation analysis, we identified a metal-free active site within the cone-shaped α+β barrel structure. Furthermore, subcellular localization experiments revealed that COG4337 proteins are targeted into plastids and mitochondria of B. natans, implicating their involvement in CO2 metabolism in these organelles. Conclusions COG4337 proteins shared a short sequence motif and overall structure with ι-class CAs, whereas they were characterized by metal independence, unlike any known CAs. Therefore, COG4337 proteins could be treated as a variant type of ι-class CAs. Our findings suggested that this novel type of ι-CAs can function even in metal-poor environments (e.g., the open ocean) without competition with other metalloproteins for trace metals. Considering the widespread prevalence of ι-CAs across microalgae, this class of CAs may play a role in the global carbon cycle. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01039-8.

In the present study, we report a novel type of CAs that has been characterized in the eukaryotic microalga Bigelowiella natans and the cyanobacterium Anabaena sp. PCC7120. A previously uncharacterized protein "COG4337," classified in clusters of orthologous groups (COGs), has been found in various microorganisms [19]. Our biochemical assays demonstrated recombinant COG4337 proteins to be able to catalyze CO 2 hydration. Surprisingly, they showed the activity under metaldepleted conditions, unlike other known CAs. We also proposed a possible catalytic model for CO 2 hydration based on their X-ray structures and point mutation analysis. COG4337 proteins are the first example of CAs, to our knowledge, that can function under limiting environments of trace metals.

Results and discussion
Frequent occurrence of COG4337 proteins COG4337 proteins have been found in various prokaryotes and eukaryotic microalgae including ecological important species [19]. NCBI BLAST searches (June 13, 2020) detected this uncharacterized protein in thousands of prokaryotic genomes from proteobacteria (2954 hits), cyanobacteria (98 hits), firmicutes (375 hits), bacteroidetes (89 hits), and several Archaea. Additionally, phylogenetically diverse eukaryotic algae (e.g., dinoflagellates, haptophytes, ochrophytes, prasinophytes, rhodophytes, euglenophytes, and chlorarachniophytes) were found to possess COG4337 homologs. COG4337 proteins are characterized by a conserved domain composed of approximately 160 amino acids. Interestingly, prokaryotic genes encode only a single domain whereas eukaryotic sequences often carry multiple repeat domains (up to five) (Additional file 1: Table S1). To understand the evolution of COG4337 proteins, we constructed phylogenetic trees using conserved domain sequences (Additional file 2: Figure S1). Owing to the short alignment, detailed phylogenetic relationships were poorly resolved. Eukaryotic and prokaryotic sequences were divided into two clades, and some eukaryotic sequences were found to be patchily distributed within the prokaryotic clade, probably due to multiple independent gene transfers from bacteria to eukaryotes. Domain sequences generally have moderate variations among repeats in most eukaryotes, and the tree suggests that domain duplication events have occurred several times before diversification of species in each algal lineage.
To assess the requirement of metal cofactors for catalysis, we tested the effects of chelating agents and metal ions on COG4337 proteins. The recombinant Bn86287 and all2909 proteins were treated with either 50 mM EDTA and 6 M urea or 50 mM 2,6-pyridinedicarboxylic acid (PDA) for 5 h, followed by dialysis against a metal-free buffer (20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, pH 8.0). Unexpectedly, these treatments caused no decrease in CA activity (Fig. 1c, d), whereas the active site zinc ions of BCA were able to be removed by the same treatments (Fig. 1e). Next, Bn86287 and all2909 were treated with 2 mM Mg 2+ , Ca 2+ , Mn 2+ , Fe 2+ , Co 2+ , Ni 2+ , Zn 2+ , or Cd 2+ to check whether the CA activity would be affected by the addition of divalent metal ions. Several metals caused partial precipitation of proteins, which was removed by centrifugation. The CA activity did not increase in all metal treatment groups, and the addition of zinc ions appears to negatively affect the enzyme activity in both Bn86287 and all2909 (Fig. 1c, d). To further support the metal independence of the COG4337 proteins, we performed an inductively coupled plasma optical emission spectroscopy (ICP-OES) analysis of the six metals (Mg, Ca, Mn, Co, Zn, and Cd). No such metals were found to bind to Bn86287 and all2909, whereas zinc was obtained in BCA at a predicted concentration (Additional file 3: Table S2). Taken together, these results suggested that the COG4337 proteins are metal-free enzymes that catalyze the hydration of CO 2 to HCO 3 -, but not the reverse reaction.

Overall structure
To further analyze the metal-free catalytic mechanism of COG4337 proteins, crystal structures of Bn86287 and all2909 were determined (Fig. 2, Table 1).
Crystallographic analysis was performed using crystals soaked in solutions with bicarbonate and the anion inhibitor iodide; the addition of 1 mM KI led to the deactivation of the COG4337 proteins (Fig. 1c, d). Iodide ions have been reported as an inhibitor in several classes of CAs [26]. The crystal structures showed that both Bn86287 and all2909 seemed to form a homodimer (Fig.  2a, b). The results of size exclusion chromatography with multi-angle static light scattering (SEC-MALS) system showed that Bn86287 exists as dimers but all2909 exists as tetramers in solution (Additional file 2: Figure  S4). Analysis with the PISA (Protein Interfaces, Surfaces and Assemblies) software [27] estimated that a tetramer of all2909 was assembled by a head-to-head interaction of two dimeric units (Additional file 2: Figure S4). Each domain formed a cone-shaped barrel structure comprising three α-helices and a four-stranded antiparallel βsheet, which were almost identical across the COG4337 domains of Bn86287 and all2909 (Fig. 2c). This folding has no similarity with that of other CAs in α-, β-, γ-, ζ-, and θ-classes. However, DALI server searches [28] revealed that an uncharacterized protein of the γproteobacterium Xanthomonas campestris (PDB ID: 3H51) shares a similar fold to the COG4337 domains ( Fig. 2g, Additional file 2: Figure S5) with a z-score of 15.2. Intriguingly, this uncharacterized protein exhibited 38 and 42% sequence identity with ι-class CAs of the diatom T. pseudonana and the bacterium Burkholderia territorii, respectively [29]. Although CA activity has not been confirmed in the X. campestris protein and no crystal structures are available for the T. pseudonana and B. territorii proteins, ι-CAs are potentially a structural homolog of COG4337 proteins.

Catalytic active site
The substrate (bicarbonate) and inhibitor (iodide ion) were found in a bent finger-like cavity of the coneshaped barrel (Fig. 2d). As expected from the biochemical assays described above, electron densities corresponding to metals were not detected in the cavity. Residues lining the cavity surface around bicarbonate/ iodide were mostly conserved in the COG4337 domains (Fig. 2e, f), as well as in the apparent ι-CA of X. campestris (Additional file 2: Figure S5). One part of the cavity was dominated by hydrophilic residues (Thr, Ser, and His) and another consisted of hydrophobic residues (Trp and Phe). Notably, Lys180 in all2909 was found at the (See figure on previous page.) Fig. 1 CA activity of COG4337 proteins. a Schematic images of Bn86287 and all2909 proteins. b Sequence alignment of COG4337 domains extracted from Bn86287 and all2909. Amino acids conserved in all and three of the four domains are shaded in black and gray, respectively. Positions of α-helices and β-strands estimated by X-ray crystallography are shown above the alignment. c-e CO 2 hydration activity of Bn86287, all2909, and α-class bovine CA (BCA) under various conditions: +EDTA, proteins were treated with 50 mM EDTA and 6 M urea; +PDA, proteins were treated with 50 mM 2,6-pyridinedicarboxylic acid (PDA); chemical symbols, metal ions were added to protein solution; I − , 1 mM KI was added to the reaction solution as an inhibitor. Bovine serum albumin (BSA) was used as a negative control. Significant differences compared to non-treated samples were determined by the two-tailed Student's t test (*P < 0.02, **P < 0.01). In the graphs, error bars represent the SD calculated from three individual experiments. f HCO 3 dehydration activity of COG4337 proteins and BCA To evaluate the functional importance of the cavity-forming residues, we performed point mutation analysis, wherein Thr106, Tyr124, Lys180, His197, and Ser199 were substituted by alanine in all2909 (Fig. 3a, c), and Thr486, Tyr503, Tyr552, His584, and Ser586 were replaced by alanine using a recombinant protein of the third COG4337 domain (431-607 amino acids) in Bn86287 (Fig. 3b, d). In all2909, the substitution of Lys180 did not affect the CA activity, while the other four mutations resulted in complete inactivation (Fig.  3c). In Bn8628, the Thr486 and Tyr503 mutants showed no CA activity, whereas the Tyr552 substitution had minimal effect on the activity (Fig. 3d). The mutations in residues His584 and Ser586 caused a 4-and 8-fold decrease in the CA activity compared to the wild type, respectively (Fig. 3d). These results suggested that an active site exists in the cavity, and Thr106/486 and Tyr124/503 residues (residue numbers in all2909/ Bn86287) are necessary for enzyme catalysis. His197/584 and Ser199/586 were also determined to be important residues, whereas the non-conserved residues (i.e., Lys180 in all2909 and Tyr552 in Bn86287) in the cavity did not appear to be involved in the catalytic activity. By the way, it is worth noting that the third domain of Bn86287 exhibited a similar CA activity (82.1 ± 5.3 WAU·mg −1 protein) to the recombinant protein containing all three domains (Fig. 3d). It seems that multiple domains of Bn86287 do not work cooperatively, and each domain of Bn86287 would have a relatively high activity compared to the single domain of all2909.

Putative catalytic mechanism
Based on the crystal structures and the results of point mutation analysis, we proposed a potential catalytic mechanism for CO 2 hydration by COG4337 proteins. In other CAs, the initial step of the reaction involves the deprotonation of active site water to an OH − ion, which further acts as a nucleophile and attacks CO 2 to generate HCO 3 - [2]. In an α-class human CA (hCAII), the Thr199-Glu106 network is assumed to accept a hydrogen bond from the zinc-bound water [10], and His64 mediates the proton transfer from the active site to bulk solvent [30]. Considering that hydroxyl groups of Thr106/159/322/486 (residue numbers in all2909/1st/ 2nd/3rd domain of Bn86287) and Tyr124/176/339/503 were found at a distance of 2.5 to 3.5 Å from an oxygen of HCO 3 and at a distance of 3.2 to 3.7 Å from an iodide ion (Fig. 3a, b, Additional file 2: Figure S5), these hydroxyl groups would most likely mediate the deprotonation of active site water. Similarly, an iodide inhibitor was reported to be positioned 3.6 Å from the hydroxyl group of Thr199 in hCAII [31]. The deprotonation process could also be assisted by the main chain nitrogen of Thr106/159/322/486 and the hydroxyl group of Ser199/258/422/586 that is positioned within hydrogen bond distance to Thr106/159/322/486. Although histidine is known to be a suitable proton-shuttle residue, His197/256/420/584 does not seem to serve this purpose, because it is located at the deep end of the cavity, far from the protein surface (Fig. 2d). The active site Tyr124/176/339/503 would likely acts as a protonshuttle residue, as it faces the cavity and is proximal to the protein surface (Fig. 2d). Indeed, according to a previous report, an active site tyrosine in the β-CA of Pisum sativum mediates the proton transfer [32]. It has been reported that CO 2 is located in a hydrophobic pocket near a phenylalanine in hCAII [33]. Assuming the same conformation for COG4337 proteins, CO 2 might possibly be positioned toward the hydrophobic part near Phe138/193/357/521 (Fig. 3a, b, Additional file 2: Figure  S5). However, further experiments are necessary to identify the CO 2 -binding site as well as the route of the proton from the active site to bulk solvent. Our analysis revealed that COG4337 proteins exhibited no obvious activity of HCO 3 dehydration (Fig. 1f), though other CAs are able to catalyze the reversible reaction of CO 2 to HCO 3 - [1,2]. This peculiar feature might be related to the absence of metal in their active sites, but further study is required to assess this possibility.

Subcellular localization
Next, we analyzed the localization of COG4337 proteins to elucidate their cellular functions. Bn86287 carries an N-terminal bipartite plastid targeting signal consisting of a signal peptide and transit peptide. Immunolocalization experiments demonstrated Bn86287 to be localized in the plastid stroma, accumulated at its periphery, but not in the pyrenoid (Fig. 4); the pyrenoid of B. natans was projected from the plastid stroma [34]. This localization pattern implicated Bn86287 to be involved partly in biophysical CO 2 -concentrating mechanisms (CCMs), whereby it possibly serves to recapture the unfixed CO 2 leaking out of plastid stroma by CO 2 hydration. Although several algal species are assumed to have a C 4 photosynthesis pathway [35,36], there is no evidence for such pathway in B. natans. Therefore, it remains unknown whether Bn86287 is involved in biochemical CCMs in which HCO 3 is fixed into C 4 compounds, such as oxaloacetate. We found the B. natans genome to encode another COG4337 protein (Bn50950), consisting of a single COG4337 domain and a mitochondrial targeting signal (Fig. 4d). The mitochondrial COG4337 might be involved in buffering of the matrix pH and providing HCO 3 for anaplerotic reactions, as in mitochondrial β-CAs of Chlamydomonas reinhardtii [18]. We also performed in silico localization prediction for 62 eukaryotic COG4337 proteins and found 14 sequences to contain a putative plastid-targeting signal, and 19 sequences to carry a mitochondrial targeting peptide (Additional file 1: Table S1). Therefore, COG4337 proteins seemed to commonly function in plastids and mitochondria in various algae. On the contrary, most bacterial COG4337 proteins were predicted to carry a canonical N-terminal signal peptide, probably for their periplasmic or extracellular localization (Additional file 1: Table S1). They might play a role in CO 2 hydration, for pH homeostasis and metabolic needs, which have been speculated for periplasmic α-CAs [37].

Comparison between COG4337 and COG4875 proteins
As mentioned above, COG4337 proteins would potentially be a structural homolog of ι-class CAs carrying COG4875 domains, and both COG4337 and COG4875 domains shared a conserved C-terminal sequence motif "His-His-Ser-Ser." Unlike COG4337 proteins, however, the ι-CA of T. pseudonana has been reported as a metalloenzyme containing Mn 2+ , based on the experiment that its metal-chelated protein was reactivated by the addition of Mn 2+ [4], and the bacterial ι-CA of B. territorii has been speculated to bind Zn 2+ [29,38]. Metal-binding sites of ι-CAs remain unclear, and their active sites and catalytic mechanisms are also unknown. Interestingly, COG4337 proteins and ι-CAs shared several characteristics other than overall structure. The T. pseudonana ι-CA was reported to be localized at the periphery of plastid stroma [4], which resembles the localization of Bn86287. The diatom ι-CAs carried two to four repeat-domains as in eukaryotic COG4337 proteins [4]. In contrast, prokaryotic COG4875 and COG4337 proteins consisted of a single domain, and they both were predicted to possess an Nterminal signal peptide probably for periplasmic localization [29]. It was reported that transcription of the T. pseudonana ι-CA was strongly induced only under low CO 2 conditions [20]. Although it remains unknown whether Bn86287 is regulated by CO 2 conditions, the Bn86287 gene has been reported to be abundantly expressed with a diurnal rhythm [39]. Both COG4337 and COG4875 genes were widely found in bacteria and eukaryotic algae. However, these two genes very rarely coexist in an organism. For example, COG4337 and COG4875 genes were detected in thousands of genomes from diverse proteobacteria by BLAST searches, but surprisingly, only a few genomes carried both genes. These two genes are distributed in bacteria regardless of their phylogenetic positions, as even closely related species possessed either one. In cyanobacteria from the genus Synechococcus, strains CC9605 and WH7805 have a COG4337 gene, whereas strains CC9311 and WH8020 possess a COG4875 gene. Interestingly, CC9311 and WH8020 have been isolated from coastal environments, while CC9605 and WH7805 are open-ocean strains [40][41][42]. It seems reasonable to suppose that the COG4337-bearing cyanobacterial strains are adapted to low metal availability in the open ocean. Although it remains unclear why most organisms do not possess COG4337 and COG4875 genes together, it can be assumed that the coexistence of these two types of CAs may cause an unfavorable situation for cells.

Conclusions
In this study, previously uncharacterized COG4337 proteins were confirmed to be CA enzymes catalyzing CO 2 hydration. COG4337 proteins were found to be metalfree enzymes, unlike any known CAs. COG4337 proteins exhibited similarity to ι-CAs in sequence, overall structure, and some other characteristics, except that ι-CAs have been reported as a metalloenzyme. We thus concluded that COG4337 proteins should be treated as a new variant of ι-class CAs. At present, ι-CAs are able to divide into metal-free COG4337-type and metaldependent COG4875-type. The property of COG4337type ι-CAs would be an advantage to avoid competition with other metalloproteins for trace metals. In other words, they can function even in metal-poor environments (e.g., the open ocean); COG4337-type might have evolved in an ancestral prokaryote in response to such environment and subsequently have been inherited in various eukaryotic lineages. Considering the widespread prevalence of ι-class CAs across microalgae including ecologically important species [4,19,43], this class of CAs may play a role in the global carbon cycle.

Protein expression and purification
The pET28-derived plasmids were transformed into the Rosetta 2 (DE3) strain of E. coli (Novagen). To express recombinant proteins, the E. coli cells were grown in LB medium at 37°C, and isopropyl β-D-1-thiogalactopyranoside (IPTG) was added at OD 600 =0.5 to a final concentration of 1 mM. After 4 h of incubation at 37°C, the cells were harvested (approximately 1.0 g) and resuspended by 10 ml BugBuster Protein Extraction Reagent (Novagen) containing 150 units of Benzonase Nuclease (Novagen), 30,000 units of rLysozyme Solution (Novagen), and Complete Protease Inhibitor Cocktail (Roche). After removing the insoluble components by centrifugation, recombinant proteins in supernatants were purified with His GraviTrap columns (Cytiva), according to the manufacturer's instruction. The eluted protein solution was replaced by a buffer containing 20 mM Tris-HCl (pH 8.0) and 100 mM NaCl using PD10 desalting columns (Cytiva). To remove an N-terminal His-tag, the protein solution (approximately 1 mg/ml) was treated with thrombin protease (Novagen) at a concentration of 0.5-2 units/ml for 12 h at 20°C. For crystallization, recombinant proteins were concentrated to 7-10 mg/ml using an Amicon Ultra-4 centrifugal filter (10 K and 30 K MWCO were used for all2909 and Bn86287, respectively). Protein concentration was determined using the Qubit Protein Assay kit with a Qubit 3.0 fluorometer (Thermo Fisher Scientific). All purified protein samples were stored at 4°C prior to analysis.

Enzymatic assays
CA activity was measured using the Wilbur and Anderson method [23] along with some modifications [24]. CO 2 hydration reaction was monitored by the drop in pH from 8.3 to 7.8 when 4 ml of ice-cold CO 2 saturated water was added into 6 ml of ice-cold 20 mM Tris-H 2 SO 4 (pH 8.3 at 20°C), with or without 6-40 μg protein. Alternatively, HCO 3 dehydration reaction was monitored by the rise in pH from 5.6 to 5.9 when 4 ml of ice-cold 50 mM NaHCO 3 was added into 6 ml of ice-cold 50 mM MES-NaOH (pH 5.3 at 20°C), with or without 10-100 μg protein. CA activity was calculated in Wilbur and Anderson units (WAU) mg −1 protein according to the following equation: WAU = T 0 /T 1 −1, where T 1 is the time for pH change in presence of proteins and T 0 is the time in the absence of proteins [23]. Bovine erythrocyte CA (BCA, Sigma-Aldrich, C3934) and bovine serum albumin (BSA, Sigma-Aldrich, A9647) were used as positive and negative controls, respectively. For the chelation of protein-binding metal ions, proteins were treated either with buffer A (50 mM EDTA and 6 M urea in 20 mM Tris-HCl, pH 8.0) or buffer B (50 mM 2,6-pyridinedicarboxylic acid, 12.5 mM MOPS, pH 7.0) for 5 h at 20°C, followed by overnight dialysis against 20 mM Tris-HCl (pH 8.0), 100 mM NaCl, and 1 mM EDTA in Slide-A-Lyzer MINI 10K Device (Thermo Fisher Scientific). These two chelating buffers were selected based on previous studies [49,50]. Reactivation of apo-BCA was achieved by the addition of 2 mM ZnCl 2 . To test the effects of divalent metals on the activity of all2909 and Bn86287, 2 mM MnCl 2 , MgCl 2 , CaCl 2 , CoCl 2 , NiCl 2 , ZnCl 2 FeCl 2 , or CdCl 2 was added to the non-treated proteins and incubated for 2 h at room temperature. Several metal treatments (i.e., all2909+Co, all2909+Zn, all2909+Cd, all2909+Fe, Bn86287+Zn, Bn86287+Cd, and Bn86287+Fe) caused partial protein precipitation, which was removed by centrifugation. Esterase activity was determined using 4-nitrophenyl acetate (Sigma-Aldrich), as described previously [51]. We measured the increase in absorption at 348 nm for 5 min at room temperature after the addition of 40 μg protein to a reaction mixture containing 0.3 ml of 3 mM 4-nitrophenyl acetate and 0.7 ml of 20 mM Tris-HCl (pH 8.0).

ICP-OES
To remove free metals, BCA (Sigma-Aldrich, C3934) and the recombinant proteins Bn86287 and all2909 were dialyzed against a buffer containing 20 mM Tris-HCl (pH 8.0) and 100 mM NaCl in Slide-A-Lyzer MINI 10K Device (Thermo Fisher Scientific). The concentrations of resulting protein solutions were estimated to be 1.1 to 1.4 mg/ml. Each protein solution was diluted 50-fold with 10 ml of Milli-Q water, and the same amount of solution was injected into the spray chamber in the Optima 2100 DV ICP-OES (PerkinElmer). The mass percentages of the six metals (Mg, Ca, Mn, Co, Zn, and Cd) were obtained for each protein solution.

Crystallization
Crystallization conditions were initially screened using Crystal Screen 1 and 2 (Hampton Research), Wizard Screens I and II (Rigaku), PEGsII (Qiagen), Index (Hampton Research), PEGIon/PEGIon2 (Hampton Research), and a Protein Complex Suite (Qiagen) with a Protein Crystallization System (PXS) at the Structural Biology Research Center, High Energy Accelerator Research Organization in Japan [52]. Screening was performed by the sitting-drop vapor-diffusion method with crystallization drops consisting of 0.2 μl protein solutions (7.0 mg/ml) and 0.2 μl screening solutions at 293 K and 277 K. Crystals of all2909 were observed after 1 week under the conditions of Index #27 (2.4 M sodium malonate, pH 7.0) at 293 K. Before diffraction data collection, crystals of all2909 were cryoprotected in a solution containing 30% glycerol and 1.7 M sodium malonate, pH 7.0, for 30 s. For the iodide-SAD phasing, crystals were soaked in iodide-containing artificial mother liquor (25 mM KI and 2.4 M sodium malonate, pH 7.0) for 1.5 h, and then cryoprotected in 30% glycerol solution with 25 mM KI and 1.7 M sodium malonate, pH 7.0, for 30 s. Crystals of the all2909-HCO 3 complex were prepared by soaking crystals into the solution containing 50 mM NaHCO 3 , 30% glycerol, and 1.7 M sodium malonate, pH 7.0, for 30 s. Crystals of Bn86287 were observed after 1 month under the conditions of Protein Complex #31 (20% PEG4000, 20% 2-propanol, and 0.1 M sodium citrate, pH 5.6) at 277 K. Before diffraction data collection, crystals of Bn86287 were cryoprotected in a solution containing 30% glycerol, 14% PEG4000, 14% 2-propanol, and 70 mM sodium citrate, pH 5.6, for 15 s. Crystals of the Bn86287-HCO 3 complex were prepared by soaking crystals in NaHCO 3 -containing solution supplemented with 30% glycerol, 14% PEG4000, 14% 2-propanol, and 70 mM sodium citrate, pH 5.6, for 2 min.
Data collection and structure determination X-ray diffraction data were collected at 95 K using an Eiger X4M detector on BL-1A, or an Eiger X16M detector on BL-17A, of the Photon Factory, KEK (Tsukuba, Japan). Diffraction data were processed and scaled by XDS and XSCALE, respectively [53]. The phases of all2909 were determined using the program Crank2 by the iodide-SAD method [54]. The phases of Bn86287 were determined using an MR-native SAD method. The coordinates of all2909 were used as the initial model for MR calculation by MOLREP [55], and the obtained initial phases were used for the MR-native SAD calculation by Crank2. Crystallographic refinements and model building were performed using PHENIX.refine [56] and Coot [57], respectively.

Immunoblotting and immunolocalization
Polyclonal antibodies against Bn86287 were raised in rabbits using a recombinant protein corresponding to its third COG4337 domain (from 431 to 607 amino acids) by Kiwa Laboratory Animals. Co., Ltd. (Wakayama, Japan). The specificity of antibodies was tested by immunoblot analysis using total proteins of B. natans. The proteins were electrophoresed on an Any kD Mini-PROTEAN TGX gel (Bio-Rad) and blotted to a polyvinylidene difluoride membrane using a Trans-Blot Turbo Transfer System (Bio-Rad). Immunoblotting was performed using an iBind Western system (Life Technologies) with an anti-Bn86287 antibody diluted at 1:500, followed by a horseradish peroxidase (HRP)-linked secondary antibody (Cytiva, NA934VS, Lot#9790787) at a dilution of 1:10,000. The signals were detected with ECL Prime Western Blotting Detection Reagent (Cytiva) and a ChemiDoc MP System (Bio-Rad). Uncropped immunoblotting images are shown in Additional file 2: Figure S6. Immunofluorescence and immunoelectron microscopic analyses were performed according to the protocol described previously [58]. For immunofluorescence labeling, fixed B. natans cells were treated with a 1:100 dilution of anti-Bn86287 antibody and a fluorescein isothiocyanate (FITC)-conjugated secondary antibody (Sigma-Aldrich, F9887, Lot#108M4818V) diluted to 1:100. Fluorescence signals were observed under an inverted Zeiss LSM 510 laser scanning microscope (Carl Zeiss). For immunoelectron microscopy, immunogold labeling was performed with the anti-Bn86287 antibody at a dilution of 1:20 and a gold-conjugated secondary antibody (Sigma-Aldrich, G7402, Lot#SLBG4607V) diluted to 1:20. Labeled sections were stained with uranyl acetate and observed under a Hitachi H7650 transmission electron microscope at 80 kV.

Subcellular localization prediction
We predicted the subcellular localization of 23 bacterial and 62 eukaryotic COG4337 homologs that were used for phylogenetic analysis. Partial sequences were removed from the prediction, based on an alignment with all homologs. The prediction of an N-terminal signal was conducted in four programs (PredSL [59], TargetP-2.0 [60], Predotar [61], and SignalP-5.0 [62]). For prasinophytes and rhodophytes, a "plant" setting was applied. In complex plastid-bearing algae (chlorarachniophytes, dinoflagellates, haptophytes, and ochrophytes), plastidtargeted proteins carry an N-terminal bipartite signal containing a signal peptide and transit peptide [63]. Transit peptides have been characterized by possessing positively charged residues in chlorarachniophytes [64] and an aromatic residue at the first position in dinoflagellates, haptophytes, and ochrophytes [63]. These characteristics were used to evaluate their plastid-targeting signals.

Localization of GFP fusion protein
For plasmid construction, a fragment encoding the Nterminal leader of Bn50950 (from 1 to 38 amino acids) was inserted at the 5′ end of yfp gene in pLaRYfp+mc vector. The chlorarachniophyte species Amorphochlora amoebiformis (CCMP2058) was transfected with this plasmid using a Gene Pulser Xcell electroporation system (Bio-Rad), as described previously [65,66], as no transformation system was available for B. natans. The mitochondria were stained with MitoTracker Orange CMTMRos (molecular probes) at a final concentration of 1 μM. Fluorescence signals were observed under an inverted Zeiss LSM 510 laser scanning microscope (Carl Zeiss).
Additional file 1: Table S1. COG4337 proteins of diverse eukaryotes and prokaryotes.
Additional file 2: Figure S1. Maximum-likelihood phylogenetic tree of COG4337 proteins. The tree was contracted with 214 of COG4337 domain sequences extracted from 30 prokaryotic and 102 eukaryotic proteins. Sequences of multiple repeated domains are labelled by the ordinal number. Numbers at nodes indicate bootstrap supports (BS) that are shown only when they are higher than 50%. Black dots correspond to ≥95% BS. The scale bar represents the expected number of amino acid substitutions per site. Figure S2. Sequence alignment of COG4337 and COG4875 domains. The alignment includes COG4337 domains extracted from Bn86287 (JGI Bigna1: 86287), Bn50950 (JGI Bigna1: 50950), and all2909 (GenBank: BAB74608), and COG4875 domains of LCIP63 (JGI Thaps3: 9854) and 3H51 (GenBank: AAM40142). Numbers next to protein names represent the position of repeated domains. Asterisks show conserved amino acids, and the C-terminal motif "His-His-Ser-Ser" is highlighted by a yellow box. Figure S3. Esterase activity. Esterase activity was measured with 4-nitrophenyl acetate as substrate. Absorption at 348 nm was monitored for 5 min after the addition of each protein at the time point 60 sec. Values of esterase activity are summarized in the table (mean ± SD of three independent experiments). BSA, bovine serum albumin. Figure S4. SEC-MALS analysis of recombinant COG4337 proteins. (A, B) Light scattering (LS, red line), differential reflective index (dRI, blue line), and the molecular weight of the protein (black line) are plotted against the elution volume. Theoretical molar mass of the Bn82787 and all2909 monomer being 55.3 kDa and 19.3 kDa, respectively. Bn86287 and all2909 were estimated to exist as dimers and tetramers in solution, respectively. (C) Analysis with the PISA (Protein Interfaces, Surfaces and Assemblies) software estimated that a tetramer of all2909 was assembled by a head-to-head interaction of two dimeric units. Figure S5 Additional file 3: Table S2. ICP-OES (inductively coupled plasma optical emission spectroscopy) analysis of six metals. Table S3. Primer sequences for plastid construction.