surflex对接基础
surflex 2.7 包括surflex-dock 和 surflex-sim. The former combines a refined descendant of the Hammerhead scoring function coupled with the alignment/conformation optimization proce-dures implemented for morphological similarity. The latter addresses aspects of computational drug design in the absence of a protein structure.
Notes: 使用surflex-dock 探测蛋白表面的沟槽?? 使用surflex-sim 计算分子相似性?? 对其它的结合态结构进行打分 将分子拆分为fragment align复合物 提取口袋和配体
在sybyl-x-2.1 中包含v2.706.13294 版本的surflex,最新版本为Surflex v3.066,后续的surflex还增加了 surflex-qmod模块,具体见本文最下方说明。
surflex-dock 入门
Notes: - 对于虚拟筛选通常使用 “-pscreen” 或者 “-lscreen” 选项,对于最优构象预测通常使用 “-pgeom”选项,对于以上两种情况使用 ring search (“+ring”) 均能够提升计算结果(同时会消耗更多时间). - 基本的参数为 -pscreen 和 -pgeom. 前者多用于虚拟筛选,自动添加参数 +premin 和 +remin,二者分别turn on ligand minimization prior to docking and all-atom in-pocket minimization after docking. -pgeom 则包括 -pscreen的内容,并添加了 -multistart 4, -ndock_final 20 和 -div_rms 0.5 三个参数。 - 建议使用Sybyl mol2 结构作为输入,同时也可使用MDL或者sd格式(小分子),PDB(蛋白,容易出错)。所有输入文件需要加氢(包括非极性氢) - surflex-dock 允许 multi-structure docking (multiple protein conformations for a single target), and protein pocket flexibility.
流程: - 构建protomol - 对接 - 结果分析及后处理
proto command: Protomol Generation
形成protomol的两种方式: ligand-based 和 residue-based 1
2
3surflex-dock.exe proto ligand_reference.mol2 protein.mol2 p1 # ligand-based
surflex-dock.exe resproto residue-list protein.mol2 p1 # residue-basedresidue-list
文件,格式如下: 1
2
3
4
5ILE1
VAL2
ILE118
GLY122
ASN123p1
为前缀,包括
protomol(p1-protomol.mol2), 单个的 protomol
fragment(p1-probe.mol),
用来定义protomol空间的信息文件(p1-marked.pdb),用来观察参数的信息文件(p1-thresh.mol)
Notes: When using a ligand to specify an active site, by default, the voxels occupied by the ligand are explored by the protomol even if they are not highly buried. This can be turned off by employing the -mark_lig switch.
如果既无配体又不确定活性位点,可以使用如下命令 1
surflex-dock.exe proto none protein.mol2 p1
none
选项会让surflex探索蛋白表面的口袋,并在每一个口袋处形成对应的component文件(p1-comp-*.mol2),用于形成protomol:
1
surflex-dock.exe proto p1-comp-002.mol2 protein.mol2 p2
两个参数可用来控制protomol的范围: proto_thresh (default 0.5, 0-1):
determines the degree of buried-ness for the primary volume used to
generate the protomol.
值越大则体积越小。文件p1-marked-thresh-<n>.pdb
对应于不同临界值的空间。
proto_bloat (default 0): indicates how far beyond the primary volume (in Angstroms) the protomol volume should be expanded。
所以,使用如下命令可得到more extended
protomol
1
surflex-dock.exe -proto_thresh 0.2 -proto_bloat 2 proto ligand_reference.mol2 protein.mol2 p1
Smaller protomols yield faster searches, and it is not the case that the docked ligands are strictly limited to the volume of the protomol.
另外一个影响protomol范围的参数adaptive protomol switch
:
1
surflex-dock.exe +adapt_proto –adapt_thresh 0.4 proto lig.mol protein.mol2 p1
With this procedure, Surflex-Dock will build a protomol that gives a fixed degree of coverage against the residues that are proximal to the ligand (as specified) or explicitly listed residues (using the resproto command). Higher values of adapt_thresh yield higher degrees of coverage.
dock command: Docking a Single Molecule
1 | dock ligand protomol protein |
会输出打分值最高的十个构象,按打分值降序排列(final-*.mol2)。每个构象有两个打分值 affinity (-log(Kd)) 和 crash score (also pKd units). Crash score接近于0更好,报告的分数已经包括 crash score。
三个参数能够明显改善对接质量: 1. -multistart <n>
以n个初始位置作为起点进行对接,然后返回最佳结果。 For flexible
molecules, since the search is not exhaustive, using multiple starting
points will frequently yield higher-scoring and more consistent results,
independent of initial starting pose. Generally speaking, -multistart 10
is as high as one sees returns on the investment of time, with the
plateau beginning at –multistart 4. 2.
-ndock_final <n>
确定最终输出的top n个构象。 3. The
third option affects the density of alignment search. The new method can
be controlled in terms of search density with the –spindense parameter
(higher numbers indicate more dense search) as well as the -nspin
parameter (higher numbers indicate denser sampling of axial
rotations).
使用 +pflex 选项打开蛋白口袋的柔性,允许氢原子来adapt。默认的covalent force-field 强度为0.1(值比较低,允许氢原子的显著移动来适应配体结合),可以使用 +hprot 选项来允许结合口袋中所有原子的移动,包括重原子,此时建议提高protein covalent force-field的值(eg. -pcov 0.6)来约束原子的移动。
opt command: 优化已经对接的构象
1 | opt ligand protein |
输出的优化构象为 opt.mol2.
dock_list command: Docking a Molecule List
1 | dock_list liglist protomol protein log |
若ligarchive具有.mol2扩展名,则认为它是包含多个分子的mol2文件 若ligarchive具有.sd扩展名,则认为他是包含多个分子的MDL sd 格式文件 否则,认为其内容为分子路径,每行一个分子
输出的log文件中,三个打分值(affinity, crash, and polar contribution<氢键相关>),对应的构象输出为log-results.mol2
-fmatch option: Docking using Placed Fragments
surflex允许基于分子片段的位置进行对接,通过将目标分子ligand.mol2中的片段与给定的frag.mol2中片段进行align来实现
1
2-fmatch fmol
surflex-dock.exe -fmatch frag.mol2 dock ligand.mol2 p1-protomol.mol2 protein.mol2
fragmentize command: 将已知的配体分子拆分为fragments
使用 fragmentize 命令,将已知的配体分子拆分为fragments, +misc_ring
选项也会将环拆开作为片段 使用 choose_frags
命令,由多个fragment中选择一个更小系列的fragments,作为 -fmatch的输入。
1
2
3
4
5
6
7
8
9
10
11
12fragmentize molarchive outprefix
choose_frags fragarchive aligned_mols outprefix
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-1.mol2 pde4b/train-ligands/frag-1
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-2.mol2 pde4b/train-ligands/frag-2
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-3.mol2 pde4b/train-ligands/frag-3
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-4.mol2 pde4b/train-ligands/frag-4
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-5.mol2 pde4b/train-ligands/frag-5
cat pde4b/train-ligands/frag-*.mol2 > pde4b/train-ligands/fragall.mol2
surflex-dock.exe choose_frags pde4b/train-ligands/fragall.mol2 pde4b/train-ligands/ligandall.mol2 pde4b/train-ligands/chosenfrag
mdock_list command: Docking to Multiple Protein Conformations
1 | mdock_list liglist targpath log |
Targets
文件描述蛋白的信息及对应的protomol,允许每个蛋白构象使用单独的protomol,文件格式如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16Nproteins 5
protein-opt-1.mol2 Nprotomol 2
p-opt-1-protomol.mol2
train-ligands/ligand-opt-1.mol2
protein-opt-2.mol2 Nprotomol 2
p-opt-2-protomol.mol2
train-ligands/ligand-opt-2.mol2
protein-opt-3.mol2 Nprotomol 2
p-opt-3-protomol.mol2
train-ligands/ligand-opt-3.mol2
protein-opt-4.mol2 Nprotomol 2
p-opt-4-protomol.mol2
train-ligands/ligand-opt-4.mol2
protein-opt-5.mol2 Nprotomol 2
p-opt-5-protomol.mol2
train-ligands/ligand-opt-5.mol21
2
3surflex-dock.exe mdock_list test-ligand/testlig.mol2 Targets logdef
surflex-dock.exe -fmatch train-ligands/frag.mol2 mdock_list test-ligand/testlig.mol2 Targets logdef
Self Scoring
对于比较大的配体,其分子内的非共价相互作用可能对对接的打分值有影响。故可以使用 +self_score 选项来引入这种自身相互作用的影响。 当使用 +pflex 选项的时候会,会对蛋白自动打开 self score,来计算蛋白构象的adaptation。
Protein Pocket Alignment
基于表面的蛋白结合口袋align,四个命令: psim_matrix, psim_buildtree,
psim_one, and psim_list 1
2surflex-dock.exe psim_matrix ProteinList palign
1
surflex-dock.exe psim_buildtree palign-results out
psim_one and psim_list offer the opportunity to align one protein to another or a list of proteins to a single one and function analogously to above, without the requirement to separately build a final alignment.
Note also that the psim_buildtree command produces a file suitable
for generating a visual depiction of the alignment tree using the Dot
program from the GraphViz collection (out-tree.dot in the example
above). 1
dot -Tpdf -o psimlog-tree.pdf out-tree.dot
Protein Pocket Preparation
Frequently, a protein structure will exhibit significant clashes with
a cognate ligand. 所以可以通过pprep_protons
或pprep_all命令来调整结合口袋中氢原子或所有原子的位置来消除clashes。
1
surflex-dock pprep_protons ligand protein outprefix
输出文件为outprefix-ligand.mol2, outprefix-protein.mol2, outprefix-protein-trim.mol2(只含有口袋)。
Scoring Function Optimization
1 | optimize logprefix constraint-file init-params |
Surflex-Dock Version 2.4 and higher offers the opportunity to tune the Surflex-Dock scoring function based on additional data. The data can be positive, which includes protein/ligand complexes with known affinity. The data can also include negative information.
Ring Flexibility
Surflex-Dock implements in-lined ring flexibility in a general way. The +ring option turns this procedure on. The behavior is modified by the -rthresh parameter (kcals above global energy minimum beyond which ring conformations are not kept). Turning on ring search (“+ring”) can yield improved results for both screening and pose predic-tion at the expense of some additional computational cost.
Post-Processing Results
logprocess command
通常,我们基于overall score来对对接结果进行排序,但也可以通过以下两种方式进行排序: 1) a derived combination score, and 2) a method for placing thresholds on the crash and polar score
比如,对 polar score添加1.0的临界值,对crash
score添加-1.0的临界值,对可旋转键的数目添加100的临界值(通常,高柔性的配体更容易是假阳性)
1
2logprocess logfile
surflex-dock.exe -lp_polar 1.0 -lp_crash -1.0 -lp_rot 100 logprocess log
A combination score is given first, which combines the reported score and crash values. The threshold that is supplied on the command line for crash (-1.0 in the example above) is allowed “for free” so the amount of crash that is below that level is given back to the affinity score. So, with an observed crash of –2.0, a thresh-old of –1.0, and an affinity score of 7.1, the combination score would be 8.1. The smaller the crash threshold, generally the better able Surflex is to reject false positives. However, this may come at the expense of some true positives that have particularly tight fits into the protein active site in question.
With Version 1.31 and later, we recommend running the logprocess command with all default parameters. This yields no change from the reported score in the log file and has been used for the results in all protein screening enrichment benchmark tests.
logprocsdf command
使用 logprocsdf 选项将log文件和相应的mol2文件转换为sdf文件
1
2logprocsdf logfile
surflex-dock.exe logprocsdf log
get command 获取特定的分子
1 | get mol2archive molname outmolname |
mget command
mget
选项具有相同的格式,但第二个参数是一个一个文件,内容为待提取的分子名称。
1
mget mol2archive molnamelist outmolarchive
score_list command
处理其它对接程序的对接结果,或者是手动放置的分子 1
2score_list liglist protomol protein log
surflex-dock score_list ligarchive.mol2 protomol.mol2 protein.mol2 log
rescore_run
将会基于提供的log文件对给定的对接进行重打分。 用于检测不同的打分设置
(e.g. with –lparam) 或者打开蛋白柔性 (e.g. +pflex
,在某些特定的活性位点中对结果影响比较大)对打分的影响。
1
2rescore_run logfile protomol protein prefix
surflex-dock.exe +pflex +hprot +pcov 0.6 rescore_run log protomol protein rescoreheavy
rescore_multi
适用于multi-protein docking run 1
2rescore_multi logfile targpath prefix
surflex-dock.exe +pflex +hprot +pcov 0.6 rescore_multi log Targets rescoreheavy
Other commands:
prot 对分子进行质子化
1 | prot mol_or_mollist output_prefix |
It is suggested to make use of +misc_remin in order to eliminate conformational bias that may affect different scaffolds differently and lead to bias in results. Note: +misc_ring will generate ring conformations as well. If –misc_outconfs is set to greater than 1, then each molecule will generate an individual *-ring.mol2 file in addition to the single conformation in the molecule archive. -fp 选项设定删去所有的氢原子,然后加氢。
min
1 | min mol_or_mollist output_prefix |
功能与 prot
相似,但不进行加氢操作,故可以使用其它程序先加氢。
reorder
输入一系列分子,然后根据柔性对其进行排序(从柔到刚) 1
reorder mol2archive proportion outputarchive
search 搜索 acyclic bonds
1 | search mol1 mol2... |
random 随机摆放小分子
1 | random mol1 mol2... |
rms
RMS computes rmsd between mol1 and mol2 1
rms mol1 mol2
rms_list
1 | rms_list multi_mol reference_mol logfile (appends information on rms of confs to reference) |
trim
1 | trim protein ligand outprotein distance |
info 提供分子的信息
1 | info mol |
posefam 对对接的构象进行分类
1 | surflex-dock.exe posefam logfile |
Other options:
-multiproc
-lparam # This parameter takes a file as input that contains alternate scoring function parameters for Surflex-Dock. The file “default.param” contains the default pa-rameters for Surflex-Dock, and it is at the top-level in the software distribution.
# Surflex-Sim: 基于配体的相似性筛选 步骤: 1. 基于多个配体形成 ligand-based hypothesis 2. 基于 hypothesis 利用相似性来筛选分子 3. 结果后处理 |
---|
# SURFLE-XQMOD TECHNICAL MANUAL |
The SurflexQMOD set of algorithms integrates underlying ideas and algorithms from molecular similarity [1–6], molecular docking [7–14], and multipleinstance learning [7, 11, 13, 15–18] in order permit the construction of protein binding site analogs. The theory and use of the method for binding affinity prediction and iterative lead optimization is discussed in the companion book to this manual as well as several papers [19–22]. |
surflex-dock 详细参数列表:
1 | Usage: surflex-dock <options> <command> args |
surflex-sim 详细参数列表
1 | Usage: surflex-sim <options> <command> args |