SCOPe is a database developed at the Berkeley Lab and UC Berkeley that extends SCOP (version 1). SCOPe classifies many structures released since SCOP 1.75 through a combination of automation and manual curation, and corrects some errors, aiming to have the same accuracy as the fully hand-curated SCOP releases. SCOPe also incorporates and updates the Astral database.
In addition to new SCOPe releases, the SCOPe website provides integrated access to data found in all releases of the SCOP and Astral databases that feature stable identifiers (i.e., those since release 1.55). A history of all changes between consecutive releases of SCOP and SCOPe is available under the Stats & History menu.
In order to facilitate use of SCOPe data by SCOP and Astral users, we provide SCOPe data in parseable files in the same formats as the SCOP and Astral databases. SCOPe uses the same stable identifiers (e.g., sunid, sid, sccs) as were used for prior releases of SCOP and Astral.
In SCOPe 2.08, we have continued to perform manual curation of new Folds, Superfamilies, and Families. We classified members of 74 Pfam families having the most structures (all those with at least 25 PDB entries) that had not previously had a classified representative in SCOP or SCOPe. Among these families, 22 (30%) had at least one domain classified into a new SCOPe fold, 10 (14%) into a new superfamily in an existing fold, 33 (45%) into a new family within an existing superfamily, and 9 (12%) as a new protein within an existing family.
Variant searches. To assist in the analysis of genetic variants and to enable easier access to structural classification data, we built a search tool to map human genetic variants to protein structure and associated SCOPe data. Users can search for structures relevant to a genetic variant of interest by providing HGVS expressions or genome coordinates using hg19 and GRCh38. Examples are on the advanced variant search page.
Annotation of structural heterogeneity. We have improved consistency in how structures in the same family are divided into domains, so that automated methods (e.g., deep learning classifiers) that rely on multiple alignments of homologous SCOPe domains will be less likely to produce incorrect results due to variable domain lengths within the same family. See our help page for further details.
Annotated repeat units. Some protein domains in SCOPe consist of a number of smaller tandem repeating units. The number of repeats may or may not be the same between the domains in the same family. To facilitate automated algorithms developed or trained on the SCOPe knowledgebase, we provide machine-parseable annotations of the extent of a single repeat unit for all families of repeats in classes a to g.
We have also improved our detection of cloning artifacts (e.g., expression tags). These tags are classified in a special class (l: Artifacts) in order to separate them from the homology-based curations in the rest of the SCOPe hierarchy. Including such artifacts can result in similarity between non-homologous sequences.
All data in SCOPe (including the data from older releases of SCOP and Astral) are freely available to all users.
There are several alternative pronunciations of the vowels in the word SCOPe. All are considered correct.