Simple way for making SMILES file #RDKit

To convert SDF to SMILES I write like a following code.

..snip..
sdf = Chem.SDMolSupplier( 'some.sdf' )
with open('smiles.smi', 'w') as f:
    for mol in sdf:
        smi = Chem.MolToSmiles(mol)
        f.write("{}\n".format(smi)

In this way, to write smiles strings with properties it is needed to get properties by using GetProp(“some prop”). If I need several properties my code tend to be long.
Greg who is developer of RDKit advised me to use SmilesMolWriter. ;)

I have never used this function so I used it and found it was very useful.
Let me show the example. (copy & paste from Jupyter notebook)

import pprint
from rdkit import rdBase
from rdkit import Chem
from rdkit.Chem.rdmolfiles import SmilesWriter
print(rdBase.rdkitVersion)
[OUT]
2017.09.2

mols = [mol for mol in Chem.SDMolSupplier('cdk2.sdf') if mol != None]
# make writer object with a file name.
writer = SmilesWriter('cdk2smi1.smi')
#Check prop names.
pprint.pprint(list(mols[0].GetPropNames()))
[OUT]
['id',
 'Cluster',
 'MODEL.SOURCE',
 'MODEL.CCRATIO',
 'r_mmffld_Potential_Energy-OPLS_2005',
 'r_mmffld_RMS_Derivative-OPLS_2005',
 'b_mmffld_Minimization_Converged-OPLS_2005']

#SetProps method can set properties that will be written to files with SMILES.
writer.SetProps(['Cluster'])
#The way of writing molecules can perform common way.
for mol in mols:
    writer.write( mol )
writer.close()

Then check the file.

!head -n 10 cdk2smi1.smi
[OUT]
SMILES Name Cluster
CC(C)C(=O)COc1nc(N)nc2[nH]cnc12 ZINC03814457 1
Nc1nc(OCC2CCCO2)c2nc[nH]c2n1 ZINC03814459 2
Nc1nc(OCC2CCC(=O)N2)c2nc[nH]c2n1 ZINC03814460 2

How about set all props ?

writer = SmilesWriter('cdk2smi2.smi')
writer.SetProps(list(mols[0].GetPropNames()))
for mol in mols:
    writer.write( mol )
writer.close()
!head -n 10 cdk2smi2.smi
[OUT]
SMILES Name id Cluster MODEL.SOURCE MODEL.CCRATIO r_mmffld_Potential_Energy-OPLS_2005 r_mmffld_RMS_Derivative-OPLS_2005 b_mmffld_Minimization_Converged-OPLS_2005
CC(C)C(=O)COc1nc(N)nc2[nH]cnc12 ZINC03814457 ZINC03814457 1 CORINA 3.44 0027  09.01.2008 1 -78.6454 0.000213629 1
Nc1nc(OCC2CCCO2)c2nc[nH]c2n1 ZINC03814459 ZINC03814459 2 CORINA 3.44 0027  09.01.2008 1 -67.4705 9.48919e-05 1
Nc1nc(OCC2CCC(=O)N2)c2nc[nH]c2n1 ZINC03814460 ZINC03814460 2 CORINA 3.44 0027  09.01.2008 1 -89.4303 5.17485e-05 1
Nc1nc(OCC2CCCCC2)c2nc[nH]c2n1 ZINC00023543 ZINC00023543 3 CORINA 3.44 0027  09.01.2008 1 -70.2463 6.35949e-05 1

Wow it works fine. This method is useful and easy to use.
Do most of the RDKiters use the function?
I’m totally out of fashion!!!

I pushed my Fast Clustering code to rdkit/Contrib repository. It was big news for me.

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

4 thoughts on “Simple way for making SMILES file #RDKit

  1. Very interesting. However when fingerprints are used for computation, is there a way to convert them(fingerprints) back to smiles or mols such that the result be rewritten back to a file? Thanks!

    1. Hi,
      Thank you for your query and in my knowledge, it is difficult to do it. Because finger print is generated from hash function. It is one way.
      If you have huge amount set of chemical structures, you can find similar ( or fortunately same! ) compounds with the fingerprint. Is it answer for you?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.