Changes from ASTRAL 1.53 to 1.55: * New alignments The Protein Data Bank now provides CIF files produced by the pdb2cif program. These files include a mapping between the PDB-format records SEQRES (representing the sequence of the molecule used in an experiment) and ATOM (representing the atoms experimentally observed). The ASTRAL Rapid Access Format (RAF) Sequence Maps extract the SEQRES <-> ATOM relationship and summarize it in a form which can be rapidly parsed in most computer languages. Known errors in the CIF files are corrected manually, with the original PDB file serving as the final arbiter in case of discrepancies. This allowed us to correct all known bugs from previous versions of ASTRAL. These files replace the old CIFMAP mappings, which are still available but deprecated. Bug fixes are only available in the RAF maps. * New SCOP classification (sccs) The old classification page numbers in SCOP have been replaced by new identifiers, sccs, which stands for SCOP Concise Classification String. The sccs identifiers include only the class, fold, superfamily, and family; note that there is not a unique one for each protein. The class is represented by a letter (a-g); other levels are represented numerically. The sccs identifiers are expected to be stable across SCOP releases. The sccs identifiers can be used as keywords to search SCOP, and to link to SCOP via the SCOP search engine. For more information, please read the release notes for SCOP 1.55: http://scop.berkeley.edu/release-notes-1.55.html * New FASTA header line The header line for each sequence has changed, to reflect the new sccs identifiers and to include case sensitive chain information. To aid in parsing, a null chain with no residue range is represented as (-). Examples: >d1xer__ 4.51.1.3.1 Ferredoxin {Sulfolobus} becomes >d1xer__ d.58.1.3 (-) Ferredoxin {Sulfolobus sp.} >d3crol_ 1.36.1.2.3 cro 434 {Bacteriophage 434} becomes >d3crol_ a.35.1.2 (L:) cro 434 {Bacteriophage 434} >d1etpa1 1.3.1.3.1 (1-92) Cytochrome c4 {Pseudomonas stutzeri} becomes >d1etpa1 a.3.1.4 (A:1-92) Cytochrome c4 {Pseudomonas stutzeri} >d1tpt_1 1.48.2.1.1 (1-70) Thymidine phosphorylase {Escherichia coli} becomes >d1tpt_1 a.46.2.1 (1-70) Thymidine phosphorylase {Escherichia coli} * Genetic Domains A SCOP domain may include fragments from different PDB chains (see d1cph.1, for example). A "genetic domain" is a domain consisting of multiple chain fragments which appear to be genetically selected as the product of a single gene. In these cases, the fragments are concatenated in the order in which they appear in the original gene or sequence. In the standard ASTRAL sequences, there is a separate entry for each chain (the section of d1cph.1 in chain A: becomes e1cph.1a, and the part in chain B: becomes e1cph.1b). In the "genetic domain" sequence sets, there is a single entry, g1cph.1, in which the sequences for both chains appear in the correct order (in this case, B: before A:), separated by the letter 'X'. Here are the FASTA headers for this example: In the standard sequences, d1cph.1 becomes: >e1cph.1a g.1.1.1 (A:) Insulin {Cow (Bos taurus)} >e1cph.1b g.1.1.1 (B:) Insulin {Cow (Bos taurus)} In the genetic domain sequences, d1cph.1 becomes: >g1cph.1 g.1.1.1 (B:,A:) Insulin {Cow (Bos taurus)} Here's a more complicated example. In the standard sequences, d1bi6.2 becomes: >e1bi6.2h g.3.12.1 (H:1-7,H:32-41) Bromelain inhibitor VI... >e1bi6.2l g.3.12.1 (L:) Bromelain inhibitor VI... In the genetic domain sequences, d1bi6.2 becomes: >g1bi6.2 g.3.12.1 (L:,H:1-7,H:32-41) Bromelain inhibitor VI... All files related to the genetic domain sequences have -gd- in the file names. Files related to the standard sequence sets (*.id and *.fa) do not have -gd- in the file names. * New translation table Chemically modified residues are now included in our translation table which maps the 3-letter codes found in PDB files to one- letter codes in our sequences. This the complete table (one- letter codes are to the right of the corresponding 3-letter code): ala a val v phe f pro p met m ile i leu l asp d glu e lys k arg r ser s thr t tyr y his h cys c asn n gln q trp w gly g 2as d 3ah h 5hp e acl r aib a alm a alo t aly k arm r asa d asb d ask d asl d asq d aya a bcs c bhd d bmt t bnn a buc c bug l c5c c c6c c ccs c cea c chg a cle l cme c csd a cso c csp c css c csw c cxm m cy1 c cy3 c cyg c cym c cyq c dah f dal a dar r das d dcy c dgl e dgn q dha a dhi h dil i div v dle l dly k dnp a dpn f dpr p dsn s dsp d dth t dtr w dty y dva v efc c fla a fme m ggl e glz g gma e gsc g hac a har r hic h hip h hmr r hpq f htr w hyp p iil i iyr y kcx k llp k lly k ltr w lym k lyz k maa a men n mhs h mis s mle l mpq g msa g mse m mva v nem h nep h nle l nln l nlp l nmc g oas s ocs c omt m paq y pca e pec c phi f phl f pr3 c prr a ptr y sac s sar g sch c scs c scy c sel s sep s set s shc c shr k soc c sty y sva s tih a tpl w tpo t tpq a trg k tro w tyb y tyq y tys y tyy y agm r gl3 g smc c asx b cgu e csx c glx z