Amber残基重新编号

  使用Amber的tleap对蛋白进行处理之后,残基编号会从1开始,并且去掉了链ID。虽然可以通过简单计算得到新结构与原始结构的残基编号的对应关系,但在某些情况特别是蛋白比较复杂时,这种对应关系也会变得复杂。本文总结了多种能够对tleap处理之后的蛋白进行重新编号的方法: # tleap 处理过程 1. 原始结构(ori.pdb)

1
2
3
4
5
6
7
8
9
10
11
12
# 三条链,每条链的残基编号为2,3,4
grep CA ori.pdb

ATOM 2 CA ALA A 2 -0.001 0.064 -0.491 1.00 0.00 C
ATOM 12 CA ARG A 3 3.662 0.442 -1.288 1.00 0.00 C
ATOM 36 CA SER A 4 6.039 3.309 -0.708 1.00 0.00 C
ATOM 47 CA ALA B 2 9.740 3.624 -1.342 1.00 0.00 C
ATOM 57 CA ARG B 3 12.083 6.546 -0.928 1.00 0.00 C
ATOM 81 CA SER B 4 15.814 6.812 -1.390 1.00 0.00 C
ATOM 92 CA ALA C 2 18.134 9.772 -1.149 1.00 0.00 C
ATOM 102 CA ARG C 3 21.884 10.008 -1.436 1.00 0.00 C
ATOM 126 CA SER C 4 24.188 12.990 -1.375 1.00 0.00 C

  1. tleap 处理 pdb4amber 去除氢原子,得到 ori_noH.pdb
    1
    pdb4amber -i ori.pdb -o ori_noH.pdb --nohyd --dry
    tleap 加氢,得到 leap.prmtop,leap.inpcrd,leap.pdb
    1
    tleap -f leap.in 

parmed 方法

  parmed 处理,得到 leap-update.prmtop,会记录读取的pdb中的残基编号、链ID、occupancy、bfactor、原子编号等信息,记录在 leap-update.prmtop 文件的末尾

1
parmed -O -i parmed.in -n > parmed.log
  ambpdb 转换,得到与原始结构 ori.pdb 中残基编号、链ID一致的pdb结构 new_add.pdb
1
ambpdb -p leap-update.prmtop -ext -c leap.inpcrd > new_addH.pdb

RelabelChains.py 方法

  使用AmberUtilsRelabelChains.py脚本也可以将 leap 得到的 leap.pdb 转换为重新编号的 new_addH2.pdb

1
RelabelChains.py leap.pdb new_addH2.pdb ori.pdb

awk 方法

  awk 方法可以利用从 pdb4amber 输出的 ori_noH_renum.txt 文件中得到残基编号的映射关系,目前此脚本非常不智能

1
res_renum.awk leap.pdb > new_addH3.pdb

使用的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cat > leap.in 
source leaprc.protein.ff14SB
prot = loadpdb ori_noH.pdb
saveAmberParm prot leap.prmtop leap.inpcrd
savePdb prot leap.pdb
check prot
quit


cat > parmed.in
parm leap.prmtop
addPDB ori.pdb
outparm leap-update.prmtop
quit


cat > res_renum.awk
#!/usr/bin/awk -f

BEGIN{
# Reading Fixed-Width Data (see: https://goo.gl/SmjwUt)
FIELDWIDTHS = "6 5 1 4 1 3 1 1 4 1 3 8 8 8 6 6 6 4"
# $2: Atom serial number
# $4: Atom type
# $5: altLoc; alternate location indicator.
# $6: Resname
# $8: ChainID
# $9: Resid
# $12: x
# $13: y
# $14: z
}

{
if ($1 == "ATOM "){
if ($2 <= 48){
printf("%-6s%5s%1s%4s%1s%3s%1s%1s%4s%1s%3s%8s%8s%8s%6.2f%6.2f\n", $1,$2,$3,$4,$5,$6,$7,"A",$9+1,$10,$11,$12,$13,$14,1.00,0.00)
}
if ($2 > 48 && $2 <= 96){
printf("%-6s%5s%1s%4s%1s%3s%1s%1s%4s%1s%3s%8s%8s%8s%6.2f%6.2f\n", $1,$2,$3,$4,$5,$6,$7,"B",$9-2,$10,$11,$12,$13,$14,1.00,0.00)
}
if ($2 > 96 && $2 <= 144){
printf("%-6s%5s%1s%4s%1s%3s%1s%1s%4s%1s%3s%8s%8s%8s%6.2f%6.2f\n", $1,$2,$3,$4,$5,$6,$7,"C",$9-5,$10,$11,$12,$13,$14,1.00,0.00)
}

}
else { print $1}
}

Ref: - parmed - RelabelChains.py - PDB 文件格式说明