Coiled-coil forming parts within protein sequences can be identified relatively easily by looking for heptad repeats - amino-acid sequence motifs that, first, allow the formation of an α-helix and second, have apolar residues periodically at positions a and d within the seven amino-acid repeat (abcdefg)n. It is the long-range regular disposition of hydrophobic residues that forces two α-helices into a superhelix, that is, a coiled-coil dimer, and their hydrophobic character also determines the strength of the interaction of the α-helices within the coiled coil [7]. Charged amino acids within the coiled coil serve as ‘trigger motifs’, essential for dimerization [1].
What then distinguishes intermediate filament proteins from other coiled-coil forming proteins? Although many proteins follow a common fold without pronounced similarity in primary amino acid sequence, there are conserved “consensus sequences” about 20 amino acids long in IF proteins, and the organization of the central α-helical rod domain is followed quite strictly in IF proteins: the individual segments of coiled-coiled forming parts are separated by short variable linker motifs and exhibit a conserved number of amino acids (coil 1A: 35; coil 1B: 101; coil 2: 142; see Figure 1). A notable peculiarity of nuclear IF proteins is that they have six heptads, or 42 amino acids, more in coil 1B than vertebrate cytoplasmic IF proteins. Notably, invertebrate cytoplasmic proteins still have the long coil 1B, except for some tunicates - that is, chordates that are however still invertebrates. All these proteins have a very similar organization of coil 2, which carries in the middle region a so-called stutter region, an irregularity in the heptad pattern, at the very same position relative to IF consensus motif 2.
As mentioned above, up to now it has been thought, because of the absence of cytoplasmic IF proteins from Drosophila melanogaster, that insects do not have cytoplasmic intermediate filaments. This notion has now been challenged by the discovery of a protein in the hexapod Isotomurus maculatus, that resembles cytoplasmic IF proteins from other invertebrates quite distinctly [4]. When the primary sequence of isomin is aligned to a Drosophila lamin Dm0 (the B-type lamin), coil 1B and coil 2 match quite well in length (Figure 2). For coil 1A the number of amino acids qualifying for an α-helical fold is somewhat lower than the conventional 35 amino acids of coil 1A in standard IF proteins (note there are minor differences in our estimate of the length and location of these subdomains and those of Mencarelli et al. [4]; Figure 2b). Nevertheless, isomin coil 1A harbors a major part of the conserved IF consensus amino acid motif 1 (IF consensus motif 1, see [8]). The central part of this motif is LNDRLATY in the fly lamin and LNVRLADV in isomin (common amino acids are in bold), and also the six preceding amino acids are highly homologous, indicating that they may take part in similar molecular interactions. The second consensus motif, IF consensus motif 2, is also significantly similar: AYDKLLVGEEAR in the fly lamin and KYDSLVKVEEVR in isomin. How far this sequence has drifted from the standard IF consensus motif 2 sequence, both in isomin and in lamin Dm0, becomes clear when one compares this sequence with those present in lamins from lower invertebrates, which are nearly identical to human lamin A [1, 2]. Nevertheless, this sequence homology points to a consensus that may be functionally important (see below).
What may be the function of these motifs and why are they found in all IF proteins? We have recently proposed that both consensus domains are essential for IF protein assembly: specifically, that the longitudinal head-to-tail association of two dimers, leading to a structural overlap of about 3 nm, is mediated by the formation of two ‘hetero-coiled coils’ in the overlap zone (Figure 3; [8]). Such an interaction is consistent with the primary sequences observed in isomin. However, this assumes that isomin is a homopolymeric IF protein, and on the basis of the present data in Mencarelli and colleagues, other possibilities cannot be ruled out.
For example, extended keratin filaments form from obligate heterodimeric assembly pairs, yet keratin homopolymers may nonetheless form short fibrils (though not full-length ones) in vitro. Invertebrate cytoplasmic IF proteins often come in pairs too. Most notably, the mammalian IF multigene family harbors several members that are not able to assemble into bona fide IFs on their own, but require a ‘co-assembler’ partner. Nestin, synemin, syncoilin and individual type I and type II keratins are a few examples. Finally, the in vitro assembled filaments of isomin resemble only to some extent typical cytoplasmic IFs, especially when compared with those generated from Ascaris suum [9], although this may be attributable to different assembly conditions. Nevertheless, the tissue localisation of isomin and its resistance against extraction with high salt/non-ionic detergent-containing buffers, as well as its structural organization, all argue for its being a cytoplasmic IF-like protein.