-
Notifications
You must be signed in to change notification settings - Fork 239
Make surface molecules from smiles using Argon workaround #2607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Make surface molecules from smiles using Argon workaround #2607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Could we have a couple unit tests? I think there are some templates to copy. Can we do a round-trip (i.e. generate SMILES-like strings with X in, that reproduce the original molecule on import)?
This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days. |
This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days. |
2b17b3c
to
4732aa1
Compare
This is working with the test code described above, so now I just need to put that into an actual unit test and it'll be ready to go. |
Unit tests added, but don't work yet because the SMILES string from the rdkit translator's to_smiles() uses 'Pt' instead of 'X'. Need to figure out where is the most appropriate place to make this final translation, but then it should be ready |
Some mission creep: RMG molecule had no way to distinguish between specific Pt atom and more general surface X atom. I wanted So, I added a specific Pt atom type with the intention of being able to add other metals later on. It passes my simple test cases, but now I need to clean up the PR and look for other places this might cause problems for RMG-cat. |
Richard's working on a related PR: #2701. The rdkit * might be a more elegant way of doing this than the Ar workaround |
This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days. |
be6e89a
to
cc93abf
Compare
The code to build a surface molecule from smiles was in the init function, but this fails to account for when people create a Molecule() and then add .from_smiles(). By moving this code to the from_smiles() function, it avoids code duplication and handles both cases.
Regression Testing ResultsWARNING:root:Initial mole fractions do not sum to one; normalizing. Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:01:07 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
Identical thermo comments:
Observables Test Case: Aromatics Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! aromatics Passed Observable Testing ✅Regression test liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:20 liquid_oxidation Passed Core Comparison ✅Original model has 37 species. liquid_oxidation Failed Edge Comparison ❌Original model has 214 species.
Observables Test Case: liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! liquid_oxidation Passed Observable Testing ✅Regression test nitrogen:Reference: Execution time (DD:HH:MM:SS): 00:00:01:27 nitrogen Failed Core Comparison ❌Original model has 41 species. nitrogen Failed Edge Comparison ❌Original model has 133 species.
Observables Test Case: NC Comparison
✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions! nitrogen Passed Observable Testing ✅Regression test oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:02:27 oxidation Passed Core Comparison ✅Original model has 59 species. oxidation Passed Edge Comparison ✅Original model has 230 species.
Observables Test Case: Oxidation Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! oxidation Passed Observable Testing ✅Regression test sulfur:Reference: Execution time (DD:HH:MM:SS): 00:00:00:55 sulfur Passed Core Comparison ✅Original model has 27 species. sulfur Failed Edge Comparison ❌Original model has 89 species.
Observables Test Case: SO2 Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! sulfur Passed Observable Testing ✅Regression test superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:00:39 superminimal Passed Core Comparison ✅Original model has 13 species. superminimal Passed Edge Comparison ✅Original model has 18 species. Regression test RMS_constantVIdealGasReactor_superminimal:Reference: Execution time (DD:HH:MM:SS): 00:00:02:23 RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅Original model has 13 species. RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅Original model has 13 species.
Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅Regression test RMS_CSTR_liquid_oxidation:Reference: Execution time (DD:HH:MM:SS): 00:00:06:16 RMS_CSTR_liquid_oxidation Passed Core Comparison ✅Original model has 37 species. RMS_CSTR_liquid_oxidation Passed Edge Comparison ✅Original model has 248 species.
Observables Test Case: RMS_CSTR_liquid_oxidation Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_CSTR_liquid_oxidation Passed Observable Testing ✅Regression test fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:00:42 fragment Passed Core Comparison ✅Original model has 10 species. fragment Passed Edge Comparison ✅Original model has 33 species.
Observables Test Case: fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! fragment Passed Observable Testing ✅Regression test RMS_constantVIdealGasReactor_fragment:Reference: Execution time (DD:HH:MM:SS): 00:00:03:03 RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅Original model has 10 species. RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅Original model has 27 species.
Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison
✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions! RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅Regression test minimal_surface:Reference: Execution time (DD:HH:MM:SS): 00:00:00:45 minimal_surface Passed Core Comparison ✅Original model has 11 species. minimal_surface Passed Edge Comparison ✅Original model has 38 species.
Observables Test Case: minimal_surface Comparison
✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions! minimal_surface Passed Observable Testing ✅beep boop this comment was written by a bot 🤖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a workaround for initializing surface molecules via SMILES by temporarily replacing surface site symbols (“X” and “Pt”) with “Ar” to generate a valid adjacency list and then restoring the original symbols while resetting extra electron pair, unpaired electron, and charge information.
- Replaces “X” and “Pt” in the SMILES string with “Ar” and then post-processes the generated adjacency list to convert “Ar” back to the surface site symbols.
- Updates the internal _smiles attribute accordingly.
Motivation or Problem
The issue is described here, #2603, but basically RDKit complains when you try to make a surface molecule using smiles.
Description of Changes
This PR adds a workaround where you initialize the molecule using Ar as a replacement for X to get the adjacency list and then remove the extra pairs of electrons and change Ar back to X to get the originally intended molecule.
Testing
NOTE TO SELF -- DON'T FORGET TO REBUILD CYTHON FILES!
I added a bunch of unit tests, including a round trip test where it goes through a bunch of smiles, generates the molecule and then checks that the smiles matches the original.
I ran the minimal surface example and it works just fine.
I ran the minimal surface example and included an initial species HX using the smiles definition.
Here's a very basic example of this in action (note that some orders will not work with RDKit. [HX] fails but [XH] is fine and I do not plan to worry about it because it's an rdkit issue):
Additional Context
The point of this PR is NOT to be able to replace all the X's with Pt in an input file and run RMG. It happens to be a step in that direction, but the real purpose is to be able to specify surface species using SMILES. I never intended to mess with atom types at all, but the rdkit layer erases the metal element label, and so without RMG keeping track of atom type, there's no way to do a roundtrip test on the smiles that works for both 'HX' and 'PtX.'