cpptraj教程-01

本文内容摘抄或翻译自AMBER手册或教程,为本人学习AMBER时的笔记,如有侵权请联系删除。 # 前言 ## 分析 mdout 文件中的能量信息

1
2
process_mdout.perl md.mdout 
mdout_analyzer.py md.mdout # 比 process_mdout.perl功能更全
## amber坐标文件转换为pdb格式
1
2
ambpdb -p file.prmtop -c file.rst (or file.inpcrd)           #输出到显示器
ambpdb -p file.prmtop < file.rst > file.pdb
## SPDB-Viewer 显示 clashes In the SwissPDB Viewer, there is an option called "aa making clashes" under the "Select" menu. This command will highlight the amino acid residues that are making bad contacts with its neighbors. To visualize the clashes, choose "by selection" from the "Color" menu.

cpptraj基本教程

1
2
cpptraj -i rmsd.cpptraj > rmsd.log
cpptraj -p file.prmtop -y file.mdcrd -tl # 查看轨迹文件的帧数

注意

1
2
3
4
5
6
7
8
help command  # eg help distance
扩展名可以决定输入、输出文件的类型
select <mask> # 将会显示选中部分的原子index
parm file_name # 拓扑文件可以是.parm7, pdb, mol2, cif, psf, sdf, arc,甚至是**gmx的top**
trajin file_name # 轨迹文件可以是pdb, mol2, **gro, trr**等多种格式
cpptraj.OMP, cpptraj.MPI # 并行
export OMP_NUM_THREADS=12 # 指定使用的OMP线程数目,默认使用所有 (为了使用openMP,configure的时候应加上`-openmp` 选项)
cpptraj --defines # 检查是否通过OpenMP编译,查看是否包含 -D_OPENMP

cpptraj 选项

1
2
3
4
5
6
7
8
9
10
11
-p <Top0> 
-i <Input0>
-y <trajin> Read from trajectory file <trajin>
-x <trajout> Write trajectory file <trajout>
-c <reference> Read <reference> as reference coordinates

-tl Print length of trajectories specified with ’-y’ to STDOUT.
-ms <mask> Print selected atom numbers to STDOUT.
-mr <mask> Print selected residue numbers to STDOUT.
–mask <mask> Print detailed atom selection to STDOUT.
–resmask <mask> Print detailed residue selection to STDOUT.

导入top文件和轨迹

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
parm file.prmtop
parminfo [parm <name> | parmindex <#> | <#>] [<mask>]
resinfo [parm <name> | parmindex <#> | <#>] [<mask>] [short] # short 表示输出更简洁的信息

list [parm | ref | parm | actions | dataset] # 查看对应的信息

resinfo
atominfo
molinfo

atominfo :13 #查看残基13的信息,不是原子index 13


trajin <filename> {[<start> [<stop> | last] [<offset>]]} | lastframe [parm <parmfile / tag> | parmindex <#>]

reference <name> [<frame#>] [<mask>] ([tag]) [lastframe] [crdset] [parm <parmfile / tag> | parmindex <#>]

trajout <filename> [<fileformat>] [append] [nobox] [parm <parmfile / tag> | parmindex <#>] [onlyframes <range>] [title <title>] [start <start>] [stop <stop>] [offset <offset>] [ <Format Options> ]

导出轨迹的时候,最好不要再带box.

outtraj 根据条件输出轨迹

1
2
3
4
5
6
7
8
9
10
11
12
13
outtraj <filename> [ trajout args ] [maxmin <dataset> min <min> max <max>] ... 

trajin mdcrd.crd
trajout output.crd
outtraj BeforeRmsd.crd
rms R1 first :1-20@CA out rmsd.dat
outtraj AfterRmsd.crd
trajout是在所有的action都执行完毕后再进行,outtraj是按其在action queue中顺序执行,所以这里输出的output.crd和AfterRmsd.crd是一样的


trajin tz2.truncoct.nc
rms R1 first :2-11
outtraj maxmin.crd maxmin R1 min 0.7 max 0.8

clear 命令

1
2
3
clear [{all | <type>}]   #(<type> = actions,trajin,trajout,ref,parm,analysis,datafile,dataset)

list <type> #(<type> = actions,trajin,trajout,ref,parm,analysis,datafile,dataset)

计算rmsd (输出单位为A)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
parm file.prmtop
trajin md.mdcrd.nc
rms ToFirst :1-13&!@H= first out rmsd1.agr mass # mass 表示质量加权,first 表示一第一帧为reference, 默认会进行fit,除非使用 "nofit"选项

reference trpzip2.1LE1.1.rst7 #载入参考构象
reference trpzip2.1LE1.10.rst7 [tenth_member]

reference trpzip2.gb.nc lastframe [last] #指定导入轨迹的最后一帧作为reference,或者给出帧的编号也行

#后面使用方括号来添加tag,[tenth_member],参考结构和拓扑文件都能添加tag,之后使用文件的名称调用,或者文件的index(从0开始计数),也可以使用文件的tag进行调用

reference: Use the first loaded reference structure.
refindex <#>: Use reference index number <#> (so 'reference' is like 'refindex 0').
ref <name | tag>: Use reference specified by file name or tag.

parmindex <#>: Use topology index number <#>.
parm <name | tag>: Use topology specified by file name or tag.

rms ToMember1 :1-13&!@H= reference out rmsd2.agr
rms ToMember10 :1-13&!@H= refindex 1 out rmsd2.agr
rms ToLast :1-13&!@H= ref [last] out rmsd2.agr
# out的文件名一样的话,会都输出到这个文件,不会覆盖,可以用xmgrace一起查看

与含有不同拓扑的结构进行rmsd计算,比如轨迹的1-12号残基,和2GB1的42-47,50-55号残基

parm 2GB1.pdb
reference 2GB1.pdb parm 2GB1.pdb [GB1] #指定使用的拓扑文件
rms ToGB1 ref [GB1] :1-12@CA :42-47,50-55@CA out rmsd3.agr #轨迹的1-12号残基,和2GB1的42-47,50-55号残基

trajout trpzip2.overlap.mol2 onlyframes 1 #输出fit之后的第一帧

计算drmsd

1
drms | drmsd (distance RMSD)

symmrmsd 考虑对称性的rmsd

Perform symmetry-corrected RMSD calculation. This is done by identifying potential symmetric atoms in each residue, performing an initial best-fit, then determining which configuration of symmetric atoms will give the lowest RMSD using atomic distance to reference atoms.

rms2d 查看轨迹中每一帧与其他帧的rmsd

1
2
3
4
5
6
7
rms2d [crdset <crd set>] [<name>] [<mask>] [out <filename>] [dme | nofit | srmsd] [mass] [reftraj <traj> [parm <parmname> | parmindex <parm#>] [<refmask>]] [corr <corrfilename>]

trajin test.nc
rms2d !(@H=) rmsout test.2drms.gnu

trajin test.nc
rms2d @CA rmsout test.2drms.gnu reftraj ref.nc

cpptraj按初始距离选择残基或原子,需要先输入一个参考结构(以参考结构中的距离为准)

1
2
3
reference mol.rst7
trajin mol.rst7
strip !(:4<:3.0)

mask 按距离动态选择残基或者原子

在每一帧中进行操作,比如选中distance-based masks, 因为选择会在每一帧中更新

1
2
3
mask <mask> [maskout <filename>] [maskpdb <pdbname>] [maskmol2 <mol2name>]

mask "(:151<:3.0)&:WAT" maskout Res195WAT.dat maskpdb Res195WAT.pdb

测量距离,角度

1
2
3
4
5
6
7
8
distance d1-2 :1 :2 out d1-2.dat
# geom 选项以mask 的几何中心来测距,而不是默认的质心
# mass 表示使用 mask 原子的质心而不是几何中心

dihedral phi :1@C :2@N :2@CA :2@C out phipsi.dat
dihedral psi :2@N :2@CA :2@C :3@N out phipsi.dat

angle angle1 :2@N :2@CA :2@C out out.dat

  You may need to go through the files and manually adjust any sections where the angle has jumped from -180 to +180 so that it looks clean when plotted, I won't go into details on how to do this here but suffice to say you can do it in xmgrace with some code like:

1
y = (y < -100) ? y=y+360 : y

计算RMSF

单位为A,但如果指定bfactor 选项,则输出单位为 A2*(PI2)*8/3 默认为 byatom

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
parm topology.parm7
loadcrd mdcrd.nc
## 要不要先 fit 一下呢??
crdaction mdcrd.nc rms first @CA

crdaction mdcrd.nc average avg.pdb @CA
parm avg.pdb
reference avg.pdb parm avg.pdb
crdaction mdcrd.nc rms reference @CA
# Calculate atomic fluctuations for @CA only
crdaction mdcrd.nc atomicfluct out fluct.dat bfactor @CA
----
parm myparm.parm7
trajin mytrajectory.crd
rms first
average crdset MyAvg
run
rms ref MyAvg
atomicfluct out fluct.agr

atomicfluct out back.agr @C,CA,N byres bfactor # backbone

COORDS 相关的操作

1
2
3
4
5
6
7
8
9
The following COORDS data set commands are available:

combinecrd Combine two or more COORDS sets.
crdaction Run a single Action on a COORDS set.
crdout Write a COORDS set to a file.
createcrd (Action) Create a COORDS set during a Run.
loadcrd Create or append to a COORDS set from a file.
loadtraj Create special COORDS set where frames remain on disk.
reference Load a single trajectory frame as a reference.

合并坐标文件(把两个分子坐标合并到一个文件,而不是合并两段轨迹)

1
2
3
4
loadcrd Tyr.mol2 CRD1
loadcrd Pry.mol2 CRD2
combinedcrd CRD1 CRD2 parmname Parm-1-2 crdname CRD-1-2
crdout CRD-1-2 Tyr.Pry.mol2

合并轨迹文件

1
2
3
4
parm top1.parm7 [top1]
trajin input0.crd
trajin input1.crd parm [top1]
trajout output.crd parm [top1]

导入、导出指定帧的构象

1
2
3
4
5
6
7
trajin input.crd 1 10
trajout output.crd onlyframes 2,5-7 # 如果使用onlyframes 参数,则只输出指定的帧

# 如果输出为 file.pdb(mol2), multi 关键字表示输出为separated files
# 如果输出为 file.mol2 ,sybyltype 关键字表示输出为SYBYL 格式

crdout md.mdcrd crd1.pdb multi crdframes 1,10,1

计算rmsd

1
2
3
parm com.prmtop
loadcrd md.mdcrd parm com.prmtop
crdaction md.mdcrd rmsd first @CA out rmsd-ca.agr crdframes 1,last,10

loadcrd 与 loadtraj

1
2
3
loadcrd <filename> [parm <parm> | parmindex<#>] [<trajin args>] [name <name>]

loadtraj name <setname> [<filename>] # 如果不提供 <filename>,目前通过 trajin命令已导入的所有轨迹都会被使用,除非之前使用过’clear trajin’命令

top, 轨迹去水(离子, 其它原子)

轨迹去水

1
2
3
4
5
parm dna.prmtop
trajin md.nc
autoimage
strip :WAT
trajout md-nowater.nc netcdf nobox

top文件去水,使用 CPPTRAJ 或者 parmed.py

1
2
3
4
5
6
parmed.py -i generate_top_nowater.in

parm dna.prmtop
strip :WAT
parmout dna-nowater.prmtop
go

使用cpptraj

parmstrip [parm | parmindex <#> | <#>]

1
2
3
4
5
parm mol.water.prmtop
parmstrip :WAT
#parmstrip !(:1-36&!@H=)
parmbox nobox # 去除box 的信息,也能修改box
parmwrite out strip.wat.nobox.prmtop

strip 同时在top和轨迹中进行修改

1
2
3
4
5
6
7
8
9
10
11
strip <mask> [outprefix <name>] [nobox]

[outprefix <name>] 输出strip之后对应的top文件的前缀,strip之后的原子编号也发生了变化

strip :WAT

parm ../com.prmtop
trajin ../md-no_water-last_10ns.mdcrd 1 1000 100
reference ../com.inpcrd
strip !(:1<:6.0) outprefix f1.6 nobox
trajout f1.6.mdcrd mdcrd nobox

unstrip 恢复被strip掉的部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
比如分别保存复合物,受体,配体的pdb

parm Complex.WAT.pdb
trajin Complex.WAT.pdb
strip :WAT
outtraj Complex.pdb pdb
# Reset to solvated Complex
unstrip
strip :WAT,LIG
outtraj Receptor.pdb pdb
unstrip
strip :WAT
strip !(:LIG)
outtraj Ligand.pdb pdb

cpptraj基础教程2

autoimage

默认以第一个分子的质心为中心 [firstatom] 选项: Image based on molecule first atom; default is to image by molecule center of mass.

image

一般使用 autoimage 就足够了 For periodic systems only, image molecules/residues/atoms that are outside of the box back into the box.

1
2
center :1
image
## unwrap 与image 相反??

重设 box center

Move all atoms so that the center of the atoms in is centered at the specified location: box center (default), coordinate origin, or reference coordinates

1
center [<mask>] [origin] [mass] [ reference | ref <name> | refindex <#> [<refmask>]]

average

一般先进行结构fit,再进行average

1
2
3
4
5
6
7
8
9
10
11
average {crdset <set name> | <filename>} [<mask>] [start <start>] [stop <stop>] [offset <offset>] [Trajout Args]

average test.pdb pdb
average test.mol2 mol2 start 1 stop 100 :1-10


trajin Input.nc
average crdset MyAvg @CA
run
rms ref MyAvg @CA out RmsToAvg.dat
run

选定距离mask最近的水分子

closest 10 :1-151 first closestout closestmols.dat outprefix closest

energy

Calculate the energy for atoms in .

lie

对每一帧,计算之间的非键相互作用 Periodic topologies and trajectories are required (i.e., explicit solvent is necessary A separate electrostatic and van derWaals cutoff can be applied, the default is 12.0 Angstroms for both

1
lie [<name>] <Ligand mask> [<Surroundings mask>] [out <filename>] [noelec] [novdw] [cutvdw <cutoff>] [cutelec <cutoff>] [diel <dielc>]

filter

过滤,比如输出rmsd在0.7-0.8A之间的帧

1
2
3
4
trajin ../tz2.truncoct.nc
rms R1 first :2-11
filter R1 min 0.7 max 0.8 out filter.dat
outtraj maxmin.crd

计算回转半径

1
2
3
4
5
6
7
8
radgyr [name>] [<mask>] [out <filename>] [mass] [nomax] [tensor]

[<mask>] Atoms to calculate radius of gyration for; default all atoms
[nomax] Do not calculate maximum radius of gyration.
[tensor] Calculate radius of gyration tensor, output format ’XX YY ZZ XY XZ YZ’.


radgyr :4-10&!(@H=) out RoG.dat mass nomax

计算溶剂可及面积(单位A^2)

1
2
3
4
5
6
7
surf [<dataset name>] [<mask>] [out <filename>]

(all solute atoms if no mask specified) using the LCPO algorithm of Weiser et al.[122]. In order for this to work, the topology needs to have bond information and atom
type information.

surf out surf.dat
surf :1 out surf.dat

watershell 计算距离溶质特定距离的水分子个数

1
2
3
4
5
(通过修改solventmask也可以计算其他粒子距溶质的距离)
watershell <solutemask> [out <filename>] [lower <lower cut>] [upper <upper cut>] [noimage] [<solventmask>]

[lower <lower cut>] Cutoff for the first water shell (default 3.4 Angstroms).
[upper <upper cut>] Cutoff for the second water shell (default 5.0 Angstroms).

氢键分析

A ... H - D (A和D一般都默认为 O N F 原子)   cpptraj中氢键定义为 A(受体重原子) ... H(供体氢原子) - D(供体重原子)之间的相互作用。当A-D之间距离小于某值(3A),且A-H-D角度大于某值(默认为135°)时,判定氢键存在。   CPPTRAJ will also track solute to solvent hydrogen bonds and solvent bridging interactions (i.e. two or more solute residues hydrogen bonded to a single solvent residue).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
hbond [<dsname>] [out <filename>] [<mask>] [angle <acut>] [dist <dcut>]
[donormask <dmask> [donorhmask <dhmask>]] [acceptormask <amask>]
[avgout <filename>] [printatomnum] [nointramol] [image]
[solventdonor <sdmask>] [solventacceptor <samask>]
[solvout <filename>] [bridgeout <filename>]
[series [uuseries <filename>] [uvseries <filename>]]

[angle <acut>] Angle cutoff for hydrogen bonds (default 135°). Can be disabled by specifying -1.

[dist <dcut>] Distance cutoff for hydrogen bonds (acceptor to donor heavy atom, default 3.0 Å).

Imaging of distances/angles is not performed by default, but can be turned on using the image keyword.

Intra-molecular hydrogen bonds can be ignored using the nointramol keyword. # 内部 intra

The printatomnum keyword can be used to print atom numbers as well.

比如:

hbond :1-174 angle 135 dist 3.0 nointramol printatomnum \
out nhb_per_frame.dat avgout uu_avg.dat series uuseries uuhbonds.dat

hbond donormask :1 acceptormask :2 out nhb.dat avgout avghb.dat

搜索分子间氢键、溶质-溶剂氢键:
hbond HB out nhb.dat avgout solute_avg.dat \
solventacceptor :WAT@O solventdonor :WAT \
solvout solvent_avg.dat bridgeout bridge.dat \
series uuseries uuhbonds.dat uvseries uvhbonds.dat

搜索溶质与Na+离子之间的氢键:
hbond HB-Ion out nhb.dat avgout ion_avg.dat \
solventacceptor :Na+ solventdonor :Na+

nativecontacts 相互作用 天然相互作用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
nativecontacts [<mask1> [<mask2>]] [writecontacts <outfile>] [resout <resfile>]
[noimage] [distance <cut>] [out <filename>] [includesolvent]
[ first | reference | ref <name> | refindex <#> ]
[resoffset <n>] [contactpdb <file>] [pdbcut <cut>] [mindist] [maxdist]
[name <dsname>] [byresidue] [map [mapout <mapfile>]] [series [seriesout <file>]]

如果只提供一个 mask则在此mask内部寻找contacts。如果提供两个mask,则搜索两个mask之间的contacts

如果不提供reference,则以第一帧作为参考构象(天然相互作用)

非天然相互作用也会显示出来

Contact maps (matrices) are generated for native and non-native contacts. If byresidue is specified, contact
maps are summed over residues, and contacts between residues spaced <resoffset> residues apart in sequence are ignored.
If byresidue specified, the values for each individual atom pair are summed over the residues they belong to (this means for byresidue values greater
than 1.0 are possible).

[distance <cut>] Distance cutoff for determining native contacts in Angstroms
(default 7.0 Ang).

[resoffset <n>] (byresidue only) Ignore contacts between residues spaced less than <n> residues apart in sequence.

[contactpdb <file>] Write PDB with B-factor column containing relative contact strength (strongest is 100.0).
如果指定 contactpdb 会产生一个pdb,relative contact strengths 会写入到 B-factor 这一列.(100表示强度最大)

[mindist] If specified, determine the minimum distance between any atoms in the mask(s).
[maxdist] If specified, determine the maximum distance between any atoms in the mask(s).

[byresidue] Write out the contact map by residue instead of by atom.

eg.


parm ../com.prmtop
trajin ../md-no_water-last_10ns.mdcrd 1 1000 100
reference ../com.inpcrd
nativecontacts name dist :1-51&!@H= :152-174&!@H= \
byresidue distance 7.0 reference \
out nc_by_frame.dat mindist maxdist \
writecontacts contact_frac_byatom.dat \
resout contact_frac_byres.dat \
map mapout resmap.gnu \
contactpdb contact.pdb \
series seriesout series_byatom.dat
run

contacts

1
contacts [ first | reference | ref <ref> | refindex <#> ] [byresidue] [out <filename>] [time <interval>] [distance <cutoff>] [<mask>]

二级结构 (使用DSSP)

1
2
3
4
5
6
7
8
9
10
11
12
secstruct [<name>] [out <filename>] [<mask>] [sumout <filename>]
[assignout <filename>] [totalout <filename> [ptrajformat]
[namen <N name>] [nameh <H name>] [nameca <CA name>]
[namec <C name>] [nameo <O name>]

如果文件中氨基氢不是H而是HN,则需要 nameH HN
不同字母对应的二级结构,参考手册

secstruct :152-174 out dssp.gnu
secstruct :152-174 out dssp.gnu sumout dssp.agr

secstruct :152-174 out dssp.gnu sumout dssp.agr assignout dssp_assign.dat totalout dssp_total.agr

atom selection

ambmask命令可以查看并输出选择的结果 ambmask -p prmtop -c inpcrd -prnlev 0 -out pdb -find [maskstr]

Bug: 无法区分 alpha carbon CA and a calcium ion Ca(因为原子名称一样),但是可以通过添加残基名称来进行区分。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
按残基编号
:1-10 = "residues 1 to 10"
:1,3,5 = "residues 1, 3, and 5"
:1-3,5,7-9 = "residues 1 to 3 and residue 5 and residues 7 to 9"

按残基名称
:LYS = "all lysine residues"
:ARG,ALA,GLY = "all arginine and alanine and glycine residues"

按原子编号
@12,17 = "atoms 12 and 17"
@54-85 = "all atoms from 54 to 85"
@12,54-85,90 = "atom 12 and all atoms from 54 to 85 and atom 90"

按原子名称
@CA = all atoms with the name CA (i.e., all C-alpha atoms)
@CA,C,O,N,H = all atoms with names CA or C or O or N or H
(i.e., the entire protein backbone)

":*" means all residues and "@*" means all atoms


binary operators "&" (and) and "j" (or), unary operator "!" (negation),
distance binary operators "<:", ">:", "<@", ">@", and parentheses (圆括号)

通配符
’*’ Zero or more characters.
’=’ Same as ’*’
’?’ One character.
通配符 "=" 能匹配任何以给定字符(串)开头的名称。比如 [:AS=] 能匹配到
ASP和ASN, [@H=] 能匹配到所有以H开头的原子(所有氢原子)

[@C= & !@CA,C]
[(:1-3@CA j :5-7@CB)]
[:CYS,ARG & !(:1-10 j @CA,CB)]
[:* & !@H=] or [!@H=]

按距离选择残基或者原子
[:5 <@4.5] all atoms within 4.5A from residue 5
[(:1-55 <:3.0) & :WAT] all water molecules within 3A from residues 1-55

[:1-10@CA] is equivalent to [:1-10 & @CA]
[:LYS@H=] is equivalent to [:LYS & @H=]

:*&!@H= All non-hydrogen atoms (equivalent to "!@H=").
@CA,C,O,N,H All backbone atoms.
!@CA,C,O,N,H All non-backbone atoms (=sidechains for proteins only).

Cluster Analysis using the MMTSB Toolset

install mmtsb toolset

建立对应的文件夹

1
2
3
mkdir clustering
cd clustering
mkdir PDBfit

从轨迹中提取出pdb

1
2
3
4
5
trajin ../equil_1-10.mdcrd
#orient all frames best fit to backbone of helix in NMR structure
rms first mass :2-20@CA,C,N
#-- put all the pdb frames in a subdirectory
trajout PDBfit/trp.pdb pdb

pdb重新编号(将 1,10,100变成001,010,100)

  教程里有对应脚本,不如自己写一个

Do the clustering

1
ls -1 . > ../clustfils       #ls -1 means single column

  Remove first line since kclust can only handle < 50000 structures. We will fit structures 2 to 50,000. We need to remember that we did this as it means our pdbs actually start counting from two. This means that the ID's in the centroid files will be off by one. You will see what I mean later. Here is the clustfils file.

1
kclust -mode rmsd -centroid -cdist -heavy -lsqfit -radius 6 -maxerr 1 -iterate ../clustfils > ../Centroid_6
  • Centroid_6 文件内容
  • 97 trp.pdb.00098 5.221610 # ID member_of_this_centroid
  • rmsd_distance_from_its_centroid
  • the centeroid in pdb format

Process the Clusters

聚类中心的坐标是没有物理意义的(因为是平均结构),因而要寻找与聚类中心rmsd最小的结构 - 使用脚本 extract_centroids.awk 提取出每个聚类的成员及其中心结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
BEGIN{b0=2;}

{centind=index($1,"#Centroid");
c=$2;
getline;centind=index($0,"#Centroid");
FIL0 = sprintf("centroid%2.2d.member.dat",c)
while(centind != 1){
print $1,$3 > FIL0 ;
getline;centind=index($0,"#Centroid");
}

numcent=0; print $2,$1,NR-b0;
c=$2;
getline; endrec = index("End",$0);
while( endrec != 1 ){
FIL = sprintf("centroid%2.2d.pdb",c);
print > FIL;
getline; endrec = index($0,"#End");
}
b0=NR+2;
}

1
awk -f extract_centroids.awk Centroid_6 | tee Centroid_6.stats
  • 查看与聚类中心最接近的结构的编号
    1
    2
    3
    sort -n -k2 centroid03.member.dat | head -5      # Sort the member file numerically on the second column (rms distance from the centroid).

    cp PDBfit/trp.pdb.21341 centroid03_bestmember.pdb

Plot the RMS distances from the centroids

先将 centroid0*.member.dat 等导入到xmgrace

1
2
3
4
5
6
7
8
9
In xmgrace you can then do (xmgr menus may differ slightly):

Bring up Set Appearance window from xmgrace Plot menu
Select all sets
set Line properties Type to None
set Symbol properties to Type circle and adjust Size to 16
click Apply (at bottom)

Select each set, and change the Symbol colors (press Apply after selecting each color):