使用Bio3D进行互相关分析(cross-correlation analysis)
Bio3D是一款依托于R语言的工具包,能够分析生物大分子的序列,结构和模拟结果等数据。本文将记录Bio3D中"md"这个demo的运算流程。
Note: - R中,关于每一个函数的详细文档和示例,可通过 help() 和 example() 函数进行查看。比如,help(read.pdb)。 - 示例文件中分析的是对HIVpr的模拟轨迹,此轨迹为CHARMM/NAMD DCD 格式,并且只包含CA原子。
Getting Started
| 1 | library("bio3d") # 导入bio3d模块 | 
Reading Example Trajectory Data
导入pdb和dcd文件 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27dcdfile <- system.file("examples/hivp.dcd", package="bio3d")
pdbfile <- system.file("examples/hivp.pdb", package="bio3d")
dcd <- read.dcd(dcdfile)
pdb <- read.pdb(pdbfile)
print(pdb)    # 查看结构信息 
 Call:  read.pdb(file = pdbfile)
   Total Models#: 1
     Total Atoms#: 198,  XYZs#: 594  Chains#: 2  (values: A B)
     Protein Atoms#: 198  (residues/Calpha atoms#: 198)
     Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
     Non-protein/nucleic Atoms#: 0  (residues: 0)
     Non-protein/nucleic resid values: [ none ]
   Protein sequence:
      PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
      QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNFPQITLWQRPLVTIKIGGQLKE
      ALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTP
      VNIIGRNLLTQIGCTLNF
+ attr: atom, xyz, helix, sheet, calpha,
        call1
2
3
4print(pdb$xyz)     # 查看坐标,每一行为一帧   
print(dcd)        # 可知轨迹包含117帧,每一帧包含198个原子
dim(dcd) 
Trajectory Frame Superposition
选定所有的CA原子进行叠加 1
ca.inds <- atom.select(pdb, elety="CA")
1
2ca.inds$atom 
ca.inds$xyz 1
xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd, fixed.inds=ca.inds$xyz, mobile.inds=ca.inds$xyz)
1
dim(xyz) == dim(dcd)
1
pos_ave <- apply(xyz,2,mean)
RMSD
| 1 | rd <- rmsd(xyz[1,ca.inds$xyz], xyz[,ca.inds$xyz]) | 
RMSF
| 1 | rf <- rmsf(xyz[,ca.inds$xyz]) | 
Principal Component Analysis
Bio3D中,可使用 pca.xyz() 或者 pca.tor()
进行PCA的分析。 1
2
3
4pc <- pca.xyz(xyz[,ca.inds$xyz])
print(pc) 
+ attr: L, U, z, au, sdev, mean, call
L为特征值(从大到小排列),U为特征向量矩阵(每一列为一个归一化的特征向量),au的维度为原子数目*特征向量个数,每一列表示不同原子对某一个pc的贡献值。
1
2plot(pc, col=bwr.colors(nrow(xyz)) )
#plot.pca(pc, col=bwr.colors(nrow(xyz)) )
Below we perform a quick clustering in PC-space to further highlight
these distinct conformers. 1
2
3hc <- hclust(dist(pc$z[,1:2]))
grps <- cutree(hc, k=2)
plot(pc, col=grps)
接下来通过 plot.bio3d() 来查看不同残基对前两个主成分的贡献值:
1
2plot.bio3d(pc$au[,1], ylab="PC1 (A)", xlab="Residue Position", typ="l")
points(pc$au[,2], typ="l", col="blue")
To further aid interpretation, a PDB format trajectory can be
produced that interpolates between the most dissimilar structures in the
distribution along a given principal component. This involves dividing
the difference between the conformers into a number of evenly spaced
steps along the principal components, forming the frames of the output
multi-model PDB trajectory. Such trajectories can be directly visualized
in a molecular graphics program, such as VMD (Humphrey 1996).
Furthermore, the interpolated structures can be analyzed for possible
domain and shear movements with other Bio3D functions, or used as
initial seed structures for reaction path refinement methods (note you
will likely want to perform all heavy atom PCA for such applications).
1
2p1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], file="pc1.pdb")
p2 <- mktrj.pca(pc, pc=2,b=pc$au[,2], file="pc2.pdb")write.ncdf()函数将轨迹保存为AMBER
NetCDF格式,然后通过vmd进行查看(display as tube representation)。
1
write.ncdf(p1, "trj_pc1.nc")
Cross-Correlation Analysis
通过两两之间的互相关系数能够确定系统中不同原子的波动之间的相关程度,使用Bio3D的dccm()函数(dynamical
cross-correlation
map)能够返回互相关系数矩阵。负的相关系数表示位移的方向是相反的。
1
2cij<-dccm(xyz[,ca.inds$xyz])
plot(cij,resno=pdb)pymol.dccm()函数可以画出3D的相关系数。
1
pymol.dccm(cij, pdb, type="launch")