Skip to content

Make surface molecules from smiles using Argon workaround #2607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sevyharris
Copy link
Contributor

@sevyharris sevyharris commented Feb 7, 2024

Motivation or Problem

The issue is described here, #2603, but basically RDKit complains when you try to make a surface molecule using smiles.

Description of Changes

This PR adds a workaround where you initialize the molecule using Ar as a replacement for X to get the adjacency list and then remove the extra pairs of electrons and change Ar back to X to get the originally intended molecule.

Testing

NOTE TO SELF -- DON'T FORGET TO REBUILD CYTHON FILES!

  1. I added a bunch of unit tests, including a round trip test where it goes through a bunch of smiles, generates the molecule and then checks that the smiles matches the original.

  2. I ran the minimal surface example and it works just fine.

  3. I ran the minimal surface example and included an initial species HX using the smiles definition.

  4. Here's a very basic example of this in action (note that some orders will not work with RDKit. [HX] fails but [XH] is fine and I do not plan to worry about it because it's an rdkit issue):

import rmgpy.species
display(rmgpy.species.Species(smiles='[XH]'))
display(rmgpy.species.Species(smiles='[H][X]'))
  1. Here's some slightly more advanced code to test out instantiating all the surface species in the Surface family training reactions:
import os
import glob
import rmgpy
import rmgpy.chemkin
import rmgpy.molecule

# Get all of the surface training reactions
search_str = os.path.join(rmgpy.settings['database.directory'], 'kinetics', 'families', 'Surface_*', 'training', 'dictionary.txt')
species_dict_files = glob.glob(search_str)

# Test on all the species in surface training reactions
for dict_file in species_dict_files:
    species_dict = rmgpy.chemkin.load_species_dictionary(dict_file)
    for key in species_dict.keys():
        sp = species_dict[key]
        if 'X' not in sp.smiles:
            continue
        test_smiles = sp.smiles.replace('Pt', 'X')
        test_sp = rmgpy.molecule.Molecule(smiles=test_smiles)
        
        assert(test_sp.is_isomorphic(sp.molecule[0]))    

Additional Context

The point of this PR is NOT to be able to replace all the X's with Pt in an input file and run RMG. It happens to be a step in that direction, but the real purpose is to be able to specify surface species using SMILES. I never intended to mess with atom types at all, but the rdkit layer erases the metal element label, and so without RMG keeping track of atom type, there's no way to do a roundtrip test on the smiles that works for both 'HX' and 'PtX.'

@sevyharris sevyharris self-assigned this Feb 7, 2024
@sevyharris sevyharris linked an issue Feb 7, 2024 that may be closed by this pull request
Copy link
Member

@rwest rwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Could we have a couple unit tests? I think there are some templates to copy. Can we do a round-trip (i.e. generate SMILES-like strings with X in, that reproduce the original molecule on import)?

Copy link

github-actions bot commented Jun 7, 2024

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

@github-actions github-actions bot added the stale stale issue/PR as determined by actions bot label Jun 7, 2024
@sevyharris sevyharris removed the stale stale issue/PR as determined by actions bot label Jun 7, 2024
Copy link

github-actions bot commented Sep 6, 2024

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

@github-actions github-actions bot added the stale stale issue/PR as determined by actions bot label Sep 6, 2024
@sevyharris sevyharris removed the stale stale issue/PR as determined by actions bot label Sep 6, 2024
@sevyharris sevyharris force-pushed the surface_smiles_to_molecule branch from 2b17b3c to 4732aa1 Compare October 18, 2024 19:21
@sevyharris
Copy link
Contributor Author

This is working with the test code described above, so now I just need to put that into an actual unit test and it'll be ready to go.

@sevyharris
Copy link
Contributor Author

Unit tests added, but don't work yet because the SMILES string from the rdkit translator's to_smiles() uses 'Pt' instead of 'X'. Need to figure out where is the most appropriate place to make this final translation, but then it should be ready

@sevyharris
Copy link
Contributor Author

Some mission creep: RMG molecule had no way to distinguish between specific Pt atom and more general surface X atom.

I wanted rmgpy.molecule.Molecule(smiles="[Pt]").to_smiles() to return '[Pt]' and rmgpy.molecule.Molecule(smiles="[X]").to_smiles() to return '[X]', but there was no good way to do this since the internal RMG molecule adjacency list always converted Pt to X and then converted all X's back to Pt in the to_smiles() function.

So, I added a specific Pt atom type with the intention of being able to add other metals later on.

It passes my simple test cases, but now I need to clean up the PR and look for other places this might cause problems for RMG-cat.

@sevyharris
Copy link
Contributor Author

Richard's working on a related PR: #2701. The rdkit * might be a more elegant way of doing this than the Ar workaround

Copy link

This pull request is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant pull request, otherwise it will automatically be closed in 30 days.

@github-actions github-actions bot added the stale stale issue/PR as determined by actions bot label Jan 22, 2025
@github-actions github-actions bot added the abandoned abandoned issue/PR as determined by actions bot label Feb 21, 2025
@github-actions github-actions bot closed this Feb 21, 2025
@sevyharris sevyharris removed stale stale issue/PR as determined by actions bot abandoned abandoned issue/PR as determined by actions bot labels Feb 21, 2025
@sevyharris sevyharris reopened this Feb 21, 2025
@sevyharris sevyharris force-pushed the surface_smiles_to_molecule branch from be6e89a to cc93abf Compare May 19, 2025 14:11
@sevyharris sevyharris added the Status: Ready for Review PR is complete and ready to be reviewed label May 19, 2025
The code to build a surface molecule from smiles was in the init
function, but this fails to account for when people create a Molecule()
and then add .from_smiles(). By moving this code to the from_smiles()
function, it avoids code duplication and handles both cases.
Copy link

Regression Testing Results

WARNING:root:Initial mole fractions do not sum to one; normalizing.
WARNING:root:Initial mole fractions do not sum to one; normalizing.
WARNING:root:Initial mole fractions do not sum to one; normalizing.
⚠️ One or more regression tests failed.
Please download the failed results and run the tests locally or check the log to see why.

Detailed regression test results.

Regression test aromatics:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:07
Current: Execution time (DD:HH:MM:SS): 00:00:01:07
Reference: Memory used: 2737.84 MB
Current: Memory used: 2726.36 MB

aromatics Passed Core Comparison ✅

Original model has 15 species.
Test model has 15 species. ✅
Original model has 11 reactions.
Test model has 11 reactions. ✅

aromatics Failed Edge Comparison ❌

Original model has 106 species.
Test model has 106 species. ✅
Original model has 358 reactions.
Test model has 358 reactions. ✅

Non-identical thermo! ❌
original: C=CC1C=CC2=CC1C=C2
tested: C=CC1C=CC2=CC1C=C2

Hf(300K) S(300K) Cp(300K) Cp(400K) Cp(500K) Cp(600K) Cp(800K) Cp(1000K) Cp(1500K)
83.22 82.78 35.48 45.14 53.78 61.40 73.58 82.20 95.08
83.22 84.16 35.48 45.14 53.78 61.40 73.58 82.20 95.08

Identical thermo comments:
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds- CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsHH) + Estimated bicyclic component: polycyclic(s3_5_6_ane) - ring(Cyclohexane) - ring(Cyclopentane) + ring(1,3-Cyclohexadiene) + ring(Cyclopentadiene)

Observables Test Case: Aromatics Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

aromatics Passed Observable Testing ✅

Regression test liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:20
Current: Execution time (DD:HH:MM:SS): 00:00:02:22
Reference: Memory used: 2861.54 MB
Current: Memory used: 2847.51 MB

liquid_oxidation Passed Core Comparison ✅

Original model has 37 species.
Test model has 37 species. ✅
Original model has 241 reactions.
Test model has 241 reactions. ✅

liquid_oxidation Failed Edge Comparison ❌

Original model has 214 species.
Test model has 214 species. ✅
Original model has 1593 reactions.
Test model has 1593 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: CCCC[CH]OO(96) <=> C[CH]CCCOO(63) origin: intra_H_migration
The tested model has 1 reactions that the original model does not have. ❌
rxn: C[CH]CCCOO(63) <=> [OH](22) + CCCCC=O(60) origin: intra_H_migration

Observables Test Case: liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

liquid_oxidation Passed Observable Testing ✅

Regression test nitrogen:

Reference: Execution time (DD:HH:MM:SS): 00:00:01:27
Current: Execution time (DD:HH:MM:SS): 00:00:01:27
Reference: Memory used: 2856.71 MB
Current: Memory used: 2850.50 MB

nitrogen Failed Core Comparison ❌

Original model has 41 species.
Test model has 41 species. ✅
Original model has 359 reactions.
Test model has 360 reactions. ❌
The tested model has 1 reactions that the original model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction

nitrogen Failed Edge Comparison ❌

Original model has 133 species.
Test model has 133 species. ✅
Original model has 981 reactions.
Test model has 983 reactions. ❌
The tested model has 2 reactions that the original model does not have. ❌
rxn: HNO(48) + HCO(13) <=> NO(38) + CH2O(18) origin: H_Abstraction
rxn: HON(T)(83) + HCO(13) <=> NO(38) + CH2O(18) origin: Disproportionation

Observables Test Case: NC Comparison

✅ All Observables varied by less than 0.200 on average between old model and new model in all conditions!

nitrogen Passed Observable Testing ✅

Regression test oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:27
Current: Execution time (DD:HH:MM:SS): 00:00:02:31
Reference: Memory used: 2727.48 MB
Current: Memory used: 2711.98 MB

oxidation Passed Core Comparison ✅

Original model has 59 species.
Test model has 59 species. ✅
Original model has 694 reactions.
Test model has 694 reactions. ✅

oxidation Passed Edge Comparison ✅

Original model has 230 species.
Test model has 230 species. ✅
Original model has 1526 reactions.
Test model has 1526 reactions. ✅

Observables Test Case: Oxidation Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

oxidation Passed Observable Testing ✅

Regression test sulfur:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:55
Current: Execution time (DD:HH:MM:SS): 00:00:00:55
Reference: Memory used: 2828.73 MB
Current: Memory used: 2821.51 MB

sulfur Passed Core Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 74 reactions.
Test model has 74 reactions. ✅

sulfur Failed Edge Comparison ❌

Original model has 89 species.
Test model has 89 species. ✅
Original model has 227 reactions.
Test model has 227 reactions. ✅
The original model has 1 reactions that the tested model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary
The tested model has 1 reactions that the original model does not have. ❌
rxn: O(4) + SO2(15) (+N2) <=> SO3(16) (+N2) origin: primarySulfurLibrary

Observables Test Case: SO2 Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

sulfur Passed Observable Testing ✅

Regression test superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:39
Current: Execution time (DD:HH:MM:SS): 00:00:00:38
Reference: Memory used: 2892.35 MB
Current: Memory used: 2891.52 MB

superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 21 reactions.
Test model has 21 reactions. ✅

superminimal Passed Edge Comparison ✅

Original model has 18 species.
Test model has 18 species. ✅
Original model has 28 reactions.
Test model has 28 reactions. ✅

Regression test RMS_constantVIdealGasReactor_superminimal:

Reference: Execution time (DD:HH:MM:SS): 00:00:02:23
Current: Execution time (DD:HH:MM:SS): 00:00:02:24
Reference: Memory used: 3459.32 MB
Current: Memory used: 3457.48 MB

RMS_constantVIdealGasReactor_superminimal Passed Core Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

RMS_constantVIdealGasReactor_superminimal Passed Edge Comparison ✅

Original model has 13 species.
Test model has 13 species. ✅
Original model has 19 reactions.
Test model has 19 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_superminimal Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_superminimal Passed Observable Testing ✅

Regression test RMS_CSTR_liquid_oxidation:

Reference: Execution time (DD:HH:MM:SS): 00:00:06:16
Current: Execution time (DD:HH:MM:SS): 00:00:06:12
Reference: Memory used: 3466.72 MB
Current: Memory used: 3452.18 MB

RMS_CSTR_liquid_oxidation Passed Core Comparison ✅

Original model has 37 species.
Test model has 37 species. ✅
Original model has 202 reactions.
Test model has 202 reactions. ✅

RMS_CSTR_liquid_oxidation Passed Edge Comparison ✅

Original model has 248 species.
Test model has 248 species. ✅
Original model has 2057 reactions.
Test model has 2057 reactions. ✅

Observables Test Case: RMS_CSTR_liquid_oxidation Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_CSTR_liquid_oxidation Passed Observable Testing ✅

Regression test fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:42
Current: Execution time (DD:HH:MM:SS): 00:00:00:42
Reference: Memory used: 2654.41 MB
Current: Memory used: 2648.01 MB

fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

fragment Passed Edge Comparison ✅

Original model has 33 species.
Test model has 33 species. ✅
Original model has 47 reactions.
Test model has 47 reactions. ✅

Observables Test Case: fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

fragment Passed Observable Testing ✅

Regression test RMS_constantVIdealGasReactor_fragment:

Reference: Execution time (DD:HH:MM:SS): 00:00:03:03
Current: Execution time (DD:HH:MM:SS): 00:00:03:03
Reference: Memory used: 3611.11 MB
Current: Memory used: 3601.15 MB

RMS_constantVIdealGasReactor_fragment Passed Core Comparison ✅

Original model has 10 species.
Test model has 10 species. ✅
Original model has 2 reactions.
Test model has 2 reactions. ✅

RMS_constantVIdealGasReactor_fragment Passed Edge Comparison ✅

Original model has 27 species.
Test model has 27 species. ✅
Original model has 24 reactions.
Test model has 24 reactions. ✅

Observables Test Case: RMS_constantVIdealGasReactor_fragment Comparison

✅ All Observables varied by less than 0.100 on average between old model and new model in all conditions!

RMS_constantVIdealGasReactor_fragment Passed Observable Testing ✅

Regression test minimal_surface:

Reference: Execution time (DD:HH:MM:SS): 00:00:00:45
Current: Execution time (DD:HH:MM:SS): 00:00:00:44
Reference: Memory used: 2837.00 MB
Current: Memory used: 2827.37 MB

minimal_surface Passed Core Comparison ✅

Original model has 11 species.
Test model has 11 species. ✅
Original model has 3 reactions.
Test model has 3 reactions. ✅

minimal_surface Passed Edge Comparison ✅

Original model has 38 species.
Test model has 38 species. ✅
Original model has 38 reactions.
Test model has 38 reactions. ✅

Observables Test Case: minimal_surface Comparison

✅ All Observables varied by less than 0.500 on average between old model and new model in all conditions!

minimal_surface Passed Observable Testing ✅

beep boop this comment was written by a bot 🤖

@rwest rwest requested review from Copilot and rwest May 22, 2025 19:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a workaround for initializing surface molecules via SMILES by temporarily replacing surface site symbols (“X” and “Pt”) with “Ar” to generate a valid adjacency list and then restoring the original symbols while resetting extra electron pair, unpaired electron, and charge information.

  • Replaces “X” and “Pt” in the SMILES string with “Ar” and then post-processes the generated adjacency list to convert “Ar” back to the surface site symbols.
  • Updates the internal _smiles attribute accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Ready for Review PR is complete and ready to be reviewed Topic: Catalysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't construct surface rmgpy.molecule.Molecule()
2 participants