surflex 2.7 包括surflex-dock 和 surflex-sim. The former combines a refined descendant of the Hammerhead scoring function coupled with the alignment/conformation optimization proce-dures implemented for morphological similarity. The latter addresses aspects of computational drug design in the absence of a protein structure.
Notes:
使用surflex-dock 探测蛋白表面的沟槽??
使用surflex-sim 计算分子相似性??
对其它的结合态结构进行打分
将分子拆分为fragment
align复合物
提取口袋和配体
在sybyl-x-2.1 中包含v2.706.13294 版本的surflex,最新版本为Surflex v3.066,后续的surflex还增加了 surflex-qmod模块,具体见本文最下方说明。
surflex-dock 入门
Notes:
- 对于虚拟筛选通常使用 “-pscreen” 或者 “-lscreen” 选项,对于最优构象预测通常使用 “-pgeom”选项,对于以上两种情况使用 ring search (“+ring”) 均能够提升计算结果(同时会消耗更多时间).
- 基本的参数为 -pscreen 和 -pgeom. 前者多用于虚拟筛选,自动添加参数 +premin 和 +remin,二者分别turn on ligand minimization prior to docking and all-atom in-pocket minimization after docking.
-pgeom 则包括 -pscreen的内容,并添加了 -multistart 4, -ndock_final 20 和 -div_rms 0.5 三个参数。 - 建议使用Sybyl mol2 结构作为输入,同时也可使用MDL或者sd格式(小分子),PDB(蛋白,容易出错)。所有输入文件需要加氢(包括非极性氢)
- surflex-dock 允许 multi-structure docking (multiple protein conformations for a single target), and protein pocket flexibility.
流程:
- 构建protomol
- 对接
- 结果分析及后处理
proto command: Protomol Generation
形成protomol的两种方式: ligand-based 和 residue-based1
2
3surflex-dock.exe proto ligand_reference.mol2 protein.mol2 p1 # ligand-based
surflex-dock.exe resproto residue-list protein.mol2 p1 # residue-based
如果使用基于残基的方式,需要指定 residue-list
文件,格式如下:1
2
3
4
5ILE1
VAL2
ILE118
GLY122
ASN123
输出文件均以p1
为前缀,包括 protomol(p1-protomol.mol2), 单个的 protomol fragment(p1-probe.mol), 用来定义protomol空间的信息文件(p1-marked.pdb),用来观察参数的信息文件(p1-thresh.mol)
Notes: When using a ligand to specify an active site, by default, the voxels occupied by the ligand are explored by the protomol even if they are not highly buried. This can be turned off by employing the -mark_lig switch.
如果既无配体又不确定活性位点,可以使用如下命令1
surflex-dock.exe proto none protein.mol2 p1
none
选项会让surflex探索蛋白表面的口袋,并在每一个口袋处形成对应的component文件(p1-comp-*.mol2),用于形成protomol:1
surflex-dock.exe proto p1-comp-002.mol2 protein.mol2 p2
两个参数可用来控制protomol的范围:
proto_thresh (default 0.5, 0-1): determines the degree of buried-ness for the primary volume used to generate the protomol. 值越大则体积越小。文件p1-marked-thresh-<n>.pdb
对应于不同临界值的空间。
proto_bloat (default 0): indicates how far beyond the primary volume (in Angstroms) the protomol volume should be expanded。
所以,使用如下命令可得到more extended
protomol1
surflex-dock.exe -proto_thresh 0.2 -proto_bloat 2 proto ligand_reference.mol2 protein.mol2 p1
Smaller protomols yield faster searches, and it is not the case that the docked ligands are strictly limited to the volume of the protomol.
另外一个影响protomol范围的参数adaptive protomol switch
:1
surflex-dock.exe +adapt_proto –adapt_thresh 0.4 proto lig.mol protein.mol2 p1
With this procedure, Surflex-Dock will build a protomol that gives a fixed degree of coverage against the residues that are proximal to the ligand (as specified) or explicitly listed residues (using the resproto command). Higher values of adapt_thresh yield higher degrees of coverage.
dock command: Docking a Single Molecule
1 | dock ligand protomol protein |
会输出打分值最高的十个构象,按打分值降序排列(final-.mol2)。每个构象有两个打分值 affinity (-log(Kd)) 和 crash score (also pKd units). Crash score接近于0更好,*报告的分数已经包括 crash score。
三个参数能够明显改善对接质量:
-multistart <n>
以n个初始位置作为起点进行对接,然后返回最佳结果。
For flexible molecules, since the search is not exhaustive, using multiple starting points will frequently yield higher-scoring and more consistent results, independent of initial starting pose. Generally speaking, -multistart 10 is as high as one sees returns on the investment of time, with the plateau beginning at –multistart 4.-ndock_final <n>
确定最终输出的top n个构象。- The third option affects the density of alignment search. The new method can be controlled in terms of search density with the –spindense parameter (higher numbers indicate more dense search) as well as the -nspin parameter (higher numbers indicate denser sampling of axial rotations).
使用 +pflex 选项打开蛋白口袋的柔性,允许氢原子来adapt。默认的covalent force-field 强度为0.1(值比较低,允许氢原子的显著移动来适应配体结合),可以使用 +hprot 选项来允许结合口袋中所有原子的移动,包括重原子,此时建议提高protein covalent force-field的值(eg. -pcov 0.6)来约束原子的移动。
opt command: 优化已经对接的构象
1 | opt ligand protein |
输出的优化构象为 opt.mol2.
dock_list command: Docking a Molecule List
1 | dock_list liglist protomol protein log |
若ligarchive具有.mol2扩展名,则认为它是包含多个分子的mol2文件
若ligarchive具有.sd扩展名,则认为他是包含多个分子的MDL sd 格式文件
否则,认为其内容为分子路径,每行一个分子
输出的log文件中,三个打分值(affinity, crash, and polar contribution<氢键相关>),对应的构象输出为log-results.mol2
-fmatch option: Docking using Placed Fragments
surflex允许基于分子片段的位置进行对接,通过将目标分子ligand.mol2中的片段与给定的frag.mol2中片段进行align来实现1
2-fmatch fmol
surflex-dock.exe -fmatch frag.mol2 dock ligand.mol2 p1-protomol.mol2 protein.mol2
默认设置中,忽略frag.mol2中的氢原子的align, +fhmatch选项强制加上氢原子的align。
frag.mol2中允许存在多个片段。
在对接多个分子时,默认设置中,不含有对应片段的目标分子将按照正常模式进行对接。 +fskip 选项可以覆盖这种行为,从而自动跳过不含对应片段的分子。
-cpen 选项确定片段align时的约束力(eg. -cpne 100)。
若同时提供protomol和frag.mol2,默认将同时使用二者。如果希望只使用fragment,不使用protomol,可使用 -fdockreg 选项。
The fragment options allow for coarser conformational search (+fcoarse), varying the depth of initial conformational search from the placed fragment (-fidepth [default 3]), and varying the depth of successive conformational enumerations (-frdepth [default 3]).
fragmentize command: 将已知的配体分子拆分为fragments
使用 fragmentize 命令,将已知的配体分子拆分为fragments, +misc_ring 选项也会将环拆开作为片段
使用 choose_frags 命令,由多个fragment中选择一个更小系列的fragments,作为 -fmatch的输入。1
2
3
4
5
6
7
8
9
10
11
12fragmentize molarchive outprefix
choose_frags fragarchive aligned_mols outprefix
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-1.mol2 pde4b/train-ligands/frag-1
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-2.mol2 pde4b/train-ligands/frag-2
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-3.mol2 pde4b/train-ligands/frag-3
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-4.mol2 pde4b/train-ligands/frag-4
surflex-dock.exe fragmentize pde4b/train-ligands/ligand-5.mol2 pde4b/train-ligands/frag-5
cat pde4b/train-ligands/frag-*.mol2 > pde4b/train-ligands/fragall.mol2
surflex-dock.exe choose_frags pde4b/train-ligands/fragall.mol2 pde4b/train-ligands/ligandall.mol2 pde4b/train-ligands/chosenfrag
mdock_list command: Docking to Multiple Protein Conformations
1 | mdock_list liglist targpath log |
Targets 文件描述蛋白的信息及对应的protomol,允许每个蛋白构象使用单独的protomol,文件格式如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16Nproteins 5
protein-opt-1.mol2 Nprotomol 2
p-opt-1-protomol.mol2
train-ligands/ligand-opt-1.mol2
protein-opt-2.mol2 Nprotomol 2
p-opt-2-protomol.mol2
train-ligands/ligand-opt-2.mol2
protein-opt-3.mol2 Nprotomol 2
p-opt-3-protomol.mol2
train-ligands/ligand-opt-3.mol2
protein-opt-4.mol2 Nprotomol 2
p-opt-4-protomol.mol2
train-ligands/ligand-opt-4.mol2
protein-opt-5.mol2 Nprotomol 2
p-opt-5-protomol.mol2
train-ligands/ligand-opt-5.mol2
表明具有5个受体蛋白,对应的结构文件为protein-opt-*.mol2。
每个蛋白指定两个protomol,一个为标准的protomol,一个为cognate ligand 直接作为protomol。1
2
3surflex-dock.exe mdock_list test-ligand/testlig.mol2 Targets logdef
surflex-dock.exe -fmatch train-ligands/frag.mol2 mdock_list test-ligand/testlig.mol2 Targets logdef
Self Scoring
对于比较大的配体,其分子内的非共价相互作用可能对对接的打分值有影响。故可以使用 +self_score 选项来引入这种自身相互作用的影响。
当使用 +pflex 选项的时候会,会对蛋白自动打开 self score,来计算蛋白构象的adaptation。
Protein Pocket Alignment
基于表面的蛋白结合口袋align,四个命令: psim_matrix, psim_buildtree, psim_one, and psim_list1
surflex-dock.exe psim_matrix ProteinList palign
ProteinList 文件每一行包含蛋白和对应的配体的路径,配体用来定义蛋白的口袋。输出文件为 palign-results,包含align的信息。接下来构建 optimal tree of alignments1
surflex-dock.exe psim_buildtree palign-results out
输出文件为 out-lig-.mol2 和 out-.mol2 包含align之后的配体和蛋白结构。
out-ligands-aligned.mol2 和 out-proteins-aligned.mol2 包含align之后的结构(蛋白只有结合口袋),多个结构在一个文件中。
psim_one and psim_list offer the opportunity to align one protein to another or a list of proteins to a single one and function analogously to above, without the requirement to separately build a final alignment.
Note also that the psim_buildtree command produces a file suitable for generating a visual depiction of the alignment tree using the Dot program from the GraphViz collection (out-tree.dot in the example above).1
dot -Tpdf -o psimlog-tree.pdf out-tree.dot
Protein Pocket Preparation
Frequently, a protein structure will exhibit significant clashes with a cognate ligand. 所以可以通过pprep_protons 或pprep_all命令来调整结合口袋中氢原子或所有原子的位置来消除clashes。1
surflex-dock pprep_protons ligand protein outprefix
输出文件为outprefix-ligand.mol2, outprefix-protein.mol2, outprefix-protein-trim.mol2(只含有口袋)。
Scoring Function Optimization
1 | optimize logprefix constraint-file init-params |
Surflex-Dock Version 2.4 and higher offers the opportunity to tune the Surflex-Dock scoring function based on additional data. The data can be positive, which includes protein/ligand complexes with known affinity. The data can also include negative information.
Ring Flexibility
Surflex-Dock implements in-lined ring flexibility in a general way. The +ring option turns this procedure on. The behavior is modified by the -rthresh parameter (kcals above global energy minimum beyond which ring conformations are not kept).
Turning on ring search (“+ring”) can yield improved results for both screening and pose predic-tion at the expense of some additional computational cost.
Post-Processing Results
logprocess command
通常,我们基于overall score来对对接结果进行排序,但也可以通过以下两种方式进行排序:
1) a derived combination score, and 2) a method for placing thresholds on the crash and polar score
比如,对 polar score添加1.0的临界值,对crash score添加-1.0的临界值,对可旋转键的数目添加100的临界值(通常,高柔性的配体更容易是假阳性)1
2logprocess logfile
surflex-dock.exe -lp_polar 1.0 -lp_crash -1.0 -lp_rot 100 logprocess log
输出文件为 log-processed
A combination score is given first, which combines the reported score and crash values.
The threshold that is supplied on the command line for crash (-1.0 in the example above) is allowed “for free” so the amount of crash that is below that level is given back to the affinity score. So, with an observed crash of –2.0, a thresh-old of –1.0, and an affinity score of 7.1, the combination score would be 8.1.
The smaller the crash threshold, generally the better able Surflex is to reject false positives. However, this may come at the expense of some true positives that have particularly tight fits into the protein active site in question.
With Version 1.31 and later, we recommend running the logprocess command with all default parameters. This yields no change from the reported score in the log file and has been used for the results in all protein screening enrichment benchmark tests.
logprocsdf command
使用 logprocsdf 选项将log文件和相应的mol2文件转换为sdf文件1
2logprocsdf logfile
surflex-dock.exe logprocsdf log
get command 获取特定的分子
1 | get mol2archive molname outmolname |
mget command
mget 选项具有相同的格式,但第二个参数是一个一个文件,内容为待提取的分子名称。1
mget mol2archive molnamelist outmolarchive
score_list command
处理其它对接程序的对接结果,或者是手动放置的分子1
2score_list liglist protomol protein log
surflex-dock score_list ligarchive.mol2 protomol.mol2 protein.mol2 log
先进行打分,然后对构象进行优化,之后再进行打分,并输出log文件和对应的优化之后的构象。
rescore_run
将会基于提供的log文件对给定的对接进行重打分。
用于检测不同的打分设置 (e.g. with –lparam) 或者打开蛋白柔性 (e.g. +pflex ,在某些特定的活性位点中对结果影响比较大)对打分的影响。1
2rescore_run logfile protomol protein prefix
surflex-dock.exe +pflex +hprot +pcov 0.6 rescore_run log protomol protein rescoreheavy
rescore_multi
适用于multi-protein docking run1
2rescore_multi logfile targpath prefix
surflex-dock.exe +pflex +hprot +pcov 0.6 rescore_multi log Targets rescoreheavy
Other commands:
prot 对分子进行质子化
1 | prot mol_or_mollist output_prefix |
It is suggested to make use of +misc_remin in order to eliminate conformational bias that may affect different scaffolds differently and lead to bias in results.
Note: +misc_ring will generate ring conformations as well. If –misc_outconfs is set to greater than 1, then each molecule will generate an individual *-ring.mol2 file in addition to the single conformation in the molecule archive.
-fp 选项设定删去所有的氢原子,然后加氢。
min
1 | min mol_or_mollist output_prefix |
功能与 prot
相似,但不进行加氢操作,故可以使用其它程序先加氢。
reorder
输入一系列分子,然后根据柔性对其进行排序(从柔到刚)1
reorder mol2archive proportion outputarchive
search 搜索 acyclic bonds
1 | search mol1 mol2... |
random 随机摆放小分子
1 | random mol1 mol2... |
rms
RMS computes rmsd between mol1 and mol21
rms mol1 mol2
rms_list
1 | rms_list multi_mol reference_mol logfile (appends information on rms of confs to reference) |
trim
1 | trim protein ligand outprotein distance |
info 提供分子的信息
1 | info mol |
posefam 对对接的构象进行分类
1 | surflex-dock.exe posefam logfile |
Other options:
-multiproc
-lparam # This parameter takes a file as input that contains alternate scoring function parameters for Surflex-Dock. The file “default.param” contains the default pa-rameters for Surflex-Dock, and it is at the top-level in the software distribution.
Surflex-Sim:
基于配体的相似性筛选
步骤:
- 基于多个配体形成 ligand-based hypothesis
- 基于 hypothesis 利用相似性来筛选分子
- 结果后处理
Pre-Searching Molecules
search_library
命令,一次性将多个分子结构转化为library文件,后续可被-lscreen
和-lscreenopt
使用。1
surflex-sim.exe search_library mol_list pre
mol_list 文件中每一行分别为输入分子的路径,比如:1
2
3
4
5BasisMols/basis01.mol
BasisMols/basis02.mol
BasisMols/basis03.mol
BasisMols/basis04.mol
BasisMols/basis05.mol
输出文件为pre-.mol2,单个mol2中包含多个构象(最大数量由 -sl_nconfs 指定,默认为200);pre-.mol2_im,内容为20维的molecular imprint。使用默认参数即可。
Aligning Molecules
1 | surflex-sim.exe align mol1.mol mol2.mol |
输出的 aligned 构象文件为final-
多个分子的相互比对1
surflex-sim.exe hypo liglist log
若liglist具有.mol2扩展名,则认为它是包含多个分子的mol2文件
若liglist具有.sd扩展名,则认为他是包含多个分子的MDL sd 格式文件
否则,认为其内容为分子路径,每行一个分子
如果输入分子具有柔性环,还能使用 +ring
参数。
输出信息还包括 scores and internal clash values
Generally, the first hypothesis (hypo-0) is the most sensible to use. However, when something is known about the SAR of analogs of the ligands used to generate a hypothesis, a better selection of hypothesis may be possible by browsing the top scoring ones
Lists of molecules can be aligned to either a single molecule or to mutually aligned sets of molecules using align_list:1
2
3surflex-sim align_list LigList1 TargList log
surflex-sim +ring align_list TestAll.mol2 BestHypo.mol2 logtest
-nsim_final 参数控制输出的最终构象的个数。
The –fscreen option yields very fast similarity computations. It does this by turning off pre-minimization, all-atom post alignment optimization, eliminating molecular fragmenta-tion in favor of conformer sampling, and by limiting the degree of local pose optimization performed in the last stages of molecular alignment. This can be an especially useful parameter-ization for going through large databases quickly, especially if followed by a more thorough screen of the top fraction of ligands using more thorough settings. The –pscreen option is the recommended setting for screening, and it is fast enough for large databases in situations where multiple processors are available. For detailed studies of relative alignments, the –pgeom option is suggested, and where ring geometries are also to be considered the –pgeomx option is preferred.
Reference Sets and Similarity Vectors
比对和构象优化的速度都比较快,但是大规模的相似性计算比较耗时,这里引入一种vector-based 近似,将每个分子表示为相对于一个 reference set of molecules的 相似性向量,进而计算分子之间的形态相似性。1
surflex-sim.exe vector liglist reflist vectorfile
liglist 包含待比较的分子(内容为小分子的存储路径)
Reflist is a well-chosen set of (usually 20) molecules in fixed conformations and alignments.
输出的vectorfile中每一行为一个lig分子相对于每一个reference 分子的相似性
然后通过计算两个向量之间的Euclidean distance,即可获得两个分子的相似性。
-multiproc 指定使用多核,-fscreen 能提升速度和精确性
选择合适的reference set比较重要
1) the molecules must be orthogonal (i.e. dissimilar from each other); and 2) the molecules must come from the population of molecules for which comparisons are to be made
In the case of small-molecule drugs, selecting a diverse reference set from the CMC database (available from MDL) has worked well.
The basis set of molecules used in Cleves and Jain (2006) is included in the distribution (see the BasisMols folder under the Similarity examples). Note that we do not suggest that this set will be generally useful for all populations of molecules. We have found that the set works well for reproducing similarities within the space of approved therapeutics. It will probably be the case that for specific collections, different sets may be better.
如果你想在一个liglist中选出reference set,使用如下命令可以从中选出N个最有可能的orthogonal molecules:1
surflex-sim choose_ref liglist reflist N
比较耗时,一个有效的方法是将liglist随机划分为多个sub-list,每个sub-list中分子数目为最终reference set的3-5倍。
Molecular Similarity Examples
1 | surflex-sim.exe align mols/ligand1tng.mol2 mols/ligand3ptb.mol2 |
SURFLE-XQMOD TECHNICAL MANUAL
The SurflexQMOD set of algorithms integrates underlying ideas and algorithms from molecular similarity [1–6], molecular docking [7–14], and multipleinstance learning [7, 11, 13, 15–18] in order permit the construction of protein binding site analogs. The theory and use of the method for binding affinity prediction and iterative lead optimization is discussed in the companion book to this manual as well as several papers [19–22].
surflex-dock 详细参数列表:
1 | Usage: surflex-dock <options> <command> args |
surflex-sim 详细参数列表
1 | Usage: surflex-sim <options> <command> args |