To convert SDF to SMILES I write like a following code.
..snip.. sdf = Chem.SDMolSupplier( 'some.sdf' ) with open('smiles.smi', 'w') as f: for mol in sdf: smi = Chem.MolToSmiles(mol) f.write("{}\n".format(smi)
In this way, to write smiles strings with properties it is needed to get properties by using GetProp(“some prop”). If I need several properties my code tend to be long.
Greg who is developer of RDKit advised me to use SmilesMolWriter. ;)
I have never used this function so I used it and found it was very useful.
Let me show the example. (copy & paste from Jupyter notebook)
import pprint from rdkit import rdBase from rdkit import Chem from rdkit.Chem.rdmolfiles import SmilesWriter print(rdBase.rdkitVersion) [OUT] 2017.09.2 mols = [mol for mol in Chem.SDMolSupplier('cdk2.sdf') if mol != None] # make writer object with a file name. writer = SmilesWriter('cdk2smi1.smi') #Check prop names. pprint.pprint(list(mols[0].GetPropNames())) [OUT] ['id', 'Cluster', 'MODEL.SOURCE', 'MODEL.CCRATIO', 'r_mmffld_Potential_Energy-OPLS_2005', 'r_mmffld_RMS_Derivative-OPLS_2005', 'b_mmffld_Minimization_Converged-OPLS_2005'] #SetProps method can set properties that will be written to files with SMILES. writer.SetProps(['Cluster']) #The way of writing molecules can perform common way. for mol in mols: writer.write( mol ) writer.close()
Then check the file.
!head -n 10 cdk2smi1.smi [OUT] SMILES Name Cluster CC(C)C(=O)COc1nc(N)nc2[nH]cnc12 ZINC03814457 1 Nc1nc(OCC2CCCO2)c2nc[nH]c2n1 ZINC03814459 2 Nc1nc(OCC2CCC(=O)N2)c2nc[nH]c2n1 ZINC03814460 2
How about set all props ?
writer = SmilesWriter('cdk2smi2.smi') writer.SetProps(list(mols[0].GetPropNames())) for mol in mols: writer.write( mol ) writer.close() !head -n 10 cdk2smi2.smi [OUT] SMILES Name id Cluster MODEL.SOURCE MODEL.CCRATIO r_mmffld_Potential_Energy-OPLS_2005 r_mmffld_RMS_Derivative-OPLS_2005 b_mmffld_Minimization_Converged-OPLS_2005 CC(C)C(=O)COc1nc(N)nc2[nH]cnc12 ZINC03814457 ZINC03814457 1 CORINA 3.44 0027 09.01.2008 1 -78.6454 0.000213629 1 Nc1nc(OCC2CCCO2)c2nc[nH]c2n1 ZINC03814459 ZINC03814459 2 CORINA 3.44 0027 09.01.2008 1 -67.4705 9.48919e-05 1 Nc1nc(OCC2CCC(=O)N2)c2nc[nH]c2n1 ZINC03814460 ZINC03814460 2 CORINA 3.44 0027 09.01.2008 1 -89.4303 5.17485e-05 1 Nc1nc(OCC2CCCCC2)c2nc[nH]c2n1 ZINC00023543 ZINC00023543 3 CORINA 3.44 0027 09.01.2008 1 -70.2463 6.35949e-05 1
Wow it works fine. This method is useful and easy to use.
Do most of the RDKiters use the function?
I’m totally out of fashion!!!
I pushed my Fast Clustering code to rdkit/Contrib repository. It was big news for me.
Interesting!
Thanks. ;-)
Very interesting. However when fingerprints are used for computation, is there a way to convert them(fingerprints) back to smiles or mols such that the result be rewritten back to a file? Thanks!
Hi,
Thank you for your query and in my knowledge, it is difficult to do it. Because finger print is generated from hash function. It is one way.
If you have huge amount set of chemical structures, you can find similar ( or fortunately same! ) compounds with the fingerprint. Is it answer for you?