WOLFRAM NOTEBOOK

Stereoisomers are molecules made with identical components in different orientations. Among stereoisomers, there are geometric isomers and optical isomers. Identifying symmetries in stereoisomers can be confusing, especially as molecules become bigger and the number of symmetries grow. This project aims to automate and generalize the process of identifying stereoisomers using Wolfram Language and its Molecule interface.

Defining the Problem

Isomers are chemicals - molecules, in our case - that have the same chemical formulas (for example, the formula of water is
H
2
O
) but different structures. Stereoisomers are special types of isomers that have not only the same formula but also the same sequence of bonded atoms, but different arrangements. The two types of stereoisomers, geometric and optical isomers, can be labeled based on their layout. We’ll focus on labeling coordination complexes (molecules with central atoms). Specifically, we’ll write methods for square planar molecules - molecules that, as expected, appear as squares - and octahedral molecules - which appear as octahedra.

Geometric Isomers

We can look at square planar molecules (which are simpler than octahedral molecules) to begin with:
Square planar molecules
Out[]=
The second molecule above is a “cis” molecule. “cis” molecules have two same-element bonds at a 90 degree angle. The third molecule is a “trans” molecule. “trans” molecules have two same-element bonds at a 180 degree angle. The first molecule is an unlabeled molecule.
Now we can look at some octahedral molecules:
Octahedral molecules
Out[]=
The second and third molecules above are “cis” and “trans” molecules respectively. Note that both follow the same rules as in the square-planar molecules: 90 and 180 degree angles between same-element bonds. The fourth molecule is a “fac” molecule - “fac” being short for facial, so named because the three same-element bonds form a face that doesn’t include the central atom. The final molecule is a “mer” molecule, with the atoms forming a plane including the central atom. The first molecule, as before, is unlabeled.

Optical Isomers

Optical isomers are isomers that, unlike geometric isomers, are distinguished not by the number or orientation of their ligands but rather by their optical “direction”. In more specific terms, optical isomers (also called “enantiomers”) rotate plane-polarized light left and right. Below, we have a left-handed and right-handed isomer respectively.
Optical isomers
Out[]=
By hand they can be identified by looking at ligands and their relative “depth” position.
We can take a look at a “lambda” (left-handed) isomer:
Lambda molecule
Out[]=
If we take the three atoms closest to us - that is, atoms 4, 5, and 19 - we see they form a triangle:
Graphics[
]
Out[]=
We can then look at the three atoms farther from us - that is, atoms 2, 3, and 6 - and see that they also form a triangle:
Graphics[
]
Out[]=
If we combine the two triangles, we get:
Graphics[
]
Out[]=
We can also label the bridges - that is, the links between atoms 3 and 4, 2 and 19, and 5 and 6.
Graphics[
]
Out[]=
And then turn it upside down:
Graphics[
]
Out[]=
Because there is a bridge on the rightmost side in this orientation (front triangle with its apex up and back triangle with its apex down), this molecule is a lambda molecule. If there was no bridge, it would be a delta molecule (as below).
Graphics[
]
Out[]=

Helper Functions

Custom Molecule Object Structure

molStruct

To make molecular geometry easier to work with, we standardize data in a common structure. We want to reduce molecules to just central and central-adjacent atoms and their bonds.
molStruct takes in a Molecule object and returns a list of rules pairing atoms with their coordinates
In[]:=
molStruct[mol_Molecule]:=Module[{},central=Position[AtomList[mol],Keys[Sort[Counts[AtomList[mol]]]][[1]](*centralatom*)][[1]][[1]](*positionreturns{{index}}*);indicesDirect=findAllBondsWith[mol,central];(*cutoutanyatomsthatarenotcentral-adjacentsothat*)AppendTo[indicesDirect,central];(*anyligandsarereducedtoonlycentral-adjacentatoms*)Thread[AtomList[mol][[indicesDirect]]->QuantityMagnitude[mol["AtomCoordinates"][[indicesDirect]]]]]

Helper Methods

We also need some helper methods to shorten reused operations on molStruct and Molecule objects.
findAllBondsWith takes in a Molecule, not a molStruct, because it’s used primarily in the Optical Isomers section - where we only use molStruct internally and instead work with Molecules
getComponentVectors calculates the vectors that make up central-adjacent atoms in a molStruct
memDePairs is not a helper method for Molecule-related structures specifically, but rather just a way to determine if a two-element list is a part of a list of two-element lists

Display Functions

We also want to display some of our input molecules in sets.
moleculeRow is meant to be mapped across a list, so it takes a single Molecule as input and returns a display with pertinent information
moleculeRowApply just applies moleculeRow to a list of Molecule objects
testCaseRow is very similar to moleculeRowApply, bar one difference - it’s meant for test cases, so it runs and displays analysis with molecules
massAnal is very similar to testCaseRow, but displays molecules in a grid because it’s meant for much larger batches of test case analyses
displayValue is just a trivial function to display values with a title

Molecule Modification Functions

We also need a function to modify molecules to create test molecules - this function isn’t used in any method code, it’s just used to generate tests.
changeSubstructure is a shorthand for replacing multiple substructures in a Molecule

Test Molecules

We need some test molecules to validate our methods throughout this project, so we begin by creating a few.

Square Planar

Our first molecules are square planar molecules (so named because they take the shape of a square) with monoatomic ligands (single-atom “arms”).
baseMolSquare is a MX4 molecule used as a base testing case and as a starting point for the other test molecules
cisMolSquare and transMolSquare are MX2Y2 molecules made by modifying baseMolSquare

Octahedral

Our next molecules are octahedral molecules with monoatomic ligands.
baseMolOct is a MX6 molecule used as a base testing case and as a starting point for the other test molecules
cisMolOct and transMolOct are MX2Y4 molecules made by modifying baseMolOct
facMolOct and merMolOct are MX3Y3 molecules made by modifying baseMolOct

Diatomic Ligands

We now consider molecules with diatomic ligands - that is, molecules with arms containing two atoms.
__MolDiaSquare molecules are made using changeSubstructure on baseMolSquare
__MolDiaOct molecules are made using changeSubstructure on baseMolOct

Polyatomic Ligands

We now consider molecules with polyatomic ligands.
__MolPolySquare molecules are made differently from diatomic square planar molecules - we start from an initial molecule and remove substructures
__MolPolyOct molecules are made similarly to __MolPolySquare molecules, starting from initMolPolyOct

Optical Isomerism

Optical isomers are trickier because they differ only in the parity of their revolutions, and are large enough that many molecules end up out of place to begin with.
optDelta is made using a Molecule created through Wolfram’s MoleculeDraw[], and optLambda is made by modifying initOpt

Full Batch

We’d also like to eventually generalize a method for all of our molecules, so in preparation, we need a full batch set.
allTestCases is just a list of every test case we have

Square Planar Molecules

Square planar molecules are molecules that appear, as the name implies, in the shape of a square. Here, we’ll examine an MX4 molecule and two MX2Y2 molecules. The two classes of square planar isomers we hope to identify are “cis” and “trans”. “cis” isomers have same-element ligands that form perpendicular angles with each other, while “trans” isomers have same element ligands that are opposite each other. Any other isomer will not be labeled.
Because we already have helper functions and test molecules, we now only have to write short analysis functions for different types of molecules.
analSquare takes a molStruct and checks angles between the component vectors
One benefit of using a custom structure is that pre-processing in that structure allows us to use this same analysis method for molecules with any arbitrary ligands. We can see in the test cases below that the same method works for not only monoatomic ligands but also for polyatomic ligands.

Test Cases

Octahedral Molecules

Octahedral molecules are molecules that take, as the name implies, the shape of an octahedron. In comparison to square planar molecules, octahedral isomers can take on four forms - “cis”, “trans”, “fac”, and “mer”. “cis” octahedral molecules are MX2Y4 molecules that have a pair of perpendicular same-element ligands. “trans” molecules, as in square planar, are molecules that have a pair of opposite same-element ligands. “fac” molecules are MX3Y3 molecules with three clustered same-element ligands that form a plane not including the central atom. “mer” molecules are MXY3 molecules with three same-element ligands that form a plane including the central atom.
Similarly to above, we can write a simple method to analyze octahedral molecules.
analOct takes a molStruct and checks angles between component vectors - but unlike before, it rounds angles to multiples of 90° and so is a little more susceptible to problems with odd angles
And again, thanks to molStruct, we’re able to use analOct on any octahedral molecule regardless of the ligand.

Test Cases

Optical Isomers

Now that we can identify geometric isomers, we want to also be able to take any optical isomer and determine its handedness - that is, whether an isomer is “left-handed” or “right-handed”, defined formally by the direction in which plane-polarized light is turned.
deltaLambdaProper identifies delta/lambda molecules under ideal conditions given specific information
Unfortunately, we aren’t able to make all the assumptions we need to and gather the information necessary without some steps to clean up Molecule data. We wrap these steps into the next function.
deltaLambda takes a Molecule, not a molStruct, because it needs access to more information than just the central-adjacent atoms - specifically, it needs to know about the bridge atoms to determine which atoms are bridged

Generalization

It would also be nice to have one method that can infer the type of analysis to run given any molecule.
analMolecule takes any Molecule and infers type by examining the distribution of ligand-types in central-adjacent atoms
And we can test this new general method on all of our test cases:
You might notice that some molecules are now labeled “Missing[NotApplicable]” when they were actually labeled before. The difference is that we’ve now added an extra pre-processing step to check for symmetry before passing molecules to analysis functions. Specifically, molecules that are now labeled “Missing[NotApplicable]” are not actually “cis” or “trans” or “fac” or “mer” because they are asymmetrical. Before we had ignored external ligands and only analyzed central-adjacent ligands. We’d also disregarded bond length, meaning a large number of our test molecules are actually not symmetrical at all. This might seem like something we don’t want to do, but it’s actually more faithful to the goal of the project. The only analysis for which we don’t run this pre-processing step is for optical isomers, where we were already checking for “true symmetry” in the analysis function.
We also include a “pseudo” function below that maintains original functionality to check for pseudo-symmetry.

Just a Bit of Fun

Now, after building out our classifier functions, we wonder if it’s possible to do this with some machine learning - just a bit of fun.

Generator Functions

We start by creating generator functions to create lots of procedural training data:
generateSquares generates square-planar molecules with random ligands and central atom
generateOct generates octahedral molecules with random ligands and central atom

Data

We then want some training data:
Because generating large numbers of molecules does take quite a while, we generate 10,000 of them once and use those for all of our models
and some testing data:
Testing data can be in smaller quantities because they’re just for verification, not for training
Then it’s the important part: data preparation.
There’s a lot of ways to prep the molecule data we’re working with, and we don’t really know which one will perform best. So we just try them all:

First Preparation Method

Our first method is to take the Molecule object itself as input and the results of our already existing analysis method as output.
This first preparation method is simple: just create a list of rules from a molecule to its classification
Then it’s just training and measuring:
We can look at the results using some display functions:
We can break down these figures pretty easily: the accuracy is, as it sounds, a quantification of the accuracy of a model. The second chart is a plot of the of how the accuracy changes as the model chooses to simply not classify more and more of the testing data. The third plot is the most important: it’s a confusion matrix, or a grid that compares the true classification and the learned classification of a set. We want a distinct diagonal line from the top left to the bottom right - in other words, a matrix where most of the set is classified as it should be. Having too much white space, however, can be indicative of “overfitting”, otherwise explained as a model becoming really good at training the data it was trained on but not much else.
So in other words, there’s really nothing good here. The accuracy is low, at 0.632 and 0.427, but far worse, the confusion matrices reveal that every single molecule is just being labeled “cis”. We suspect this is because the model’s preprocessing is struggling to unwrap the data of the Molecule object, and so just classifies everything as the majority group (“cis”). In other words, we’ve achieved our already low accuracy purely by chance. Let’s try again.

Second Preparation Method

Our next method is to take SMILE strings (string representations of a molecule) as input and the results of our already existing analysis method as output.
This preparation function is a little more complicated, but marginally: we use a function from the Wolfram Function Repository to get the SMILES representation of molecules
Then it’s just training and measuring:
While our accuracy has significantly decreased, we’re encouraged by the far greater diversity in our confusion matrices. It’s also great that the models trained far, far quicker (though that’s not apparent in the final product) and appears to be more efficient in general. We continue onward.

Third Preparation Method

Our next method is to take atoms of the molecule as input and the results of our already existing analysis method as output.
This preparation function is simple enough: just get the atom list of every molecule in the set
Then it’s just training and measuring:
We like most of these results: all-time high accuracy for the square-planar model, diverse confusion matrices with pretty defined diagonals. The thing that doesn’t seem right, however, is the accuracy of the octahedral molecule. Why is it lower than the accuracy of the first model, which guessed “cis” on literally every molecule?
Here’s a thought - when we provide the list of atoms, it’s unsorted. What if we grouped the same types of atoms together? As a byproduct of sorting, we’d naturally be grouping the central-adjacent atoms together, which might help improve our training model’s pattern recognition. So we try that next.

Final Preparation Method

This preparation function uses a some obfuscated trickery to get a sorted list of atom coordinates for each molecule
We’re really happy with these results: a high accuracy for the square-planar model and a mostly acceptable one for the octahedral molecule, with clear diagonals on both of the confusion matrices.
These models aren’t perfect, of course. The accuracy rejection plot indicates some over-fitting on the square-planar model and some under-fitting on the octahedral molecule, which also lines up with the differences in their accuracies. But to make more accurate models, we’d need more data and more computing power, and while we can always generate more molecules, we unfortunately can’t just generate RAM.
Nonetheless, these models are a really interesting look into the complexities of labeling arbitrary molecules. For now, our non-machine-learning models are the superior classification technique. But maybe with some time and better models, we could create a model more flexible and more accurate than the deterministic classifier we have now.
After all, it’s just a bit of fun.

Conclusion

The results of this project provide a comprehensive and extensible system of identifying stereoisomers using Wolfram Language. Methods defined throughout this project are able to identify “cis”, “trans”, “mer”, and “fac” types of both square planar and octahedral molecules, as well as identify lambda and delta optical isomers. A final generalized method is also available to apply to any molecule, with type being inferred based on the structure of the provided argument.
The deterministic classifier created during the project is available as a Wolfram Function Repository Resource Function.

Acknowledgements

Though many minds contributed to this project, I would specifically like to thank my mentor, Daniele Ceravolo, for his help with the vector math used in this project, as well as for helping me with the intricacies of Wolfram Language. I’d also like to extend a special thanks to Dr. Jason Sonnenberg for his help with the Wolfram Chemistry interface and his ability to put up with my seemingly endless struggles with the Molecule system. Of course, I would also like to thank my advocate, Macy Levin, and my teaching assistant, Alisa Zaitseva, for fostering a wonderful environment at WSRP. A final thanks goes to my parents and family for their lifelong support.

References

  • Isomerism in Coordination Compounds. (n.d.). The Department of Chemistry, UWI, Mona, Jamaica. http://wwwchem.uwimona.edu.jm/courses/IC10Kiso.html
  • Optical Isomers. (n.d.). Purdue University James Tarpo Jr. and Margaret Tarpo Department of Chemistry. https://www.chem.purdue.edu/jmol/cchem/opti.html
  • Point Group Symmetry: New in Wolfram Language 12. (n.d.). Wolfram: Computation Meets Knowledge. https://www.wolfram.com/language/12/molecular-structure-and-computation/point-group-symmetry.html
  • Character tables for chemically important point groups. (n.d.). Character tables for chemically important point groups. http://symmetry.jacobs-university.de/
  • Melissa Garrett. (2013, September 25). Symmetry: IR and Raman Spectroscopy [Video]. YouTube. https://www.youtube.com/watch?v=1Q45PpodjJY
  • CITE THIS NOTEBOOK

    Wolfram Cloud

    You are using a browser not supported by the Wolfram Cloud

    Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


    I understand and wish to continue anyway »

    You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.