October 2022¶

Monday 3^rd October¶

Today I’m doing this ‘Data Fluency in Git and Github’ workshop to get some of my MDP hours up, and I’m teaching this arvo, so I probably won’t be doing my research today.

Edit: This training is really slow and dry… I think self directed learning might be a bit better for me.

Tuesday 4^th October¶

I didn’t quite make it in for group meeting, but I have a meeting with Katya this arvo to talk about directions for my IPR, so I’m heading in for that regardless. I’ll probably try to focus on getting some more reading in today.

Wednesday 5^th October¶

Today I was teaching, but afterwards, I spent some time running a few tests for some benchmark levels of theory. Admittedly on BODIPY which seems to be notoriously difficult. Unless otherwise specified, they all use the ωB97X-D3/aug-cc-pvtz optimised $S_1$ geometry (Ref: $\Delta E_{s^1\to s^0}^{adia} =2.504\:eV$)

Method	Emission (eV)	Error (eV)
Correlated
EOM-CCSD/aug-cc-pvdz//EOM-CCSD/aug-cc-pvdz	2.660	-0.155
RI-DLPNO-STEOM-CCSD/aug-cc-pvtz	2.334	0.171
SS-CASSCF(10,10)/aug-cc-pvdz//RI-DLPNO-STEOM-CCSD/aug-cc-pvtz	crash
Multireference
SS-CASSCF(10,10)/aug-cc-pvdz//NEVPT2(10,10)/aug-cc-pvtz	1.560	0.945
SS-CASSCF(10,10)/aug-cc-pvdz//CASCI(10,10)/aug-cc-pvtz	2.619	-0.114
SS-RI-CASSCF(10,10)/aug-cc-pvtz//SS-RI-CASCI(10,10)/aug-cc-pvtz	2.659	-0.155
RI-NEVPT2/aug-cc-pvtz	2.863	-0.358
SS-CASSCF(10,10)/aug-cc-pvtz//RI-NEVPT2/aug-cc-pvtz	2.923	-0.428
RI-NEVPT2/aug-cc-pvdz//RI-NEVPT2/aug-cc-pvtz
Double Hybrids
RI-SOS-PBE-QIDH/aug-cc-pvtz	2.703	-0.198
RI-SOS-ωPBEPP86/aug-cc-pvtz	2.581	-0.076
SS-CASSCF(10,10)/aug-cc-pvdz//RI-SOS-PBE-QIDH/aug-cc-pvtz	2.755	-0.250
SS-CASSCF(10,10)/aug-cc-pvdz//RI-SOS-ωPBEPP86/aug-cc-pvtz	2.621	-0.116
RI-SOS-PBE-QIDH/aug-cc-pvdz//RI-SOS-PBE-QIDH/aug-cc-pvtz (numerical opt)	2.685	-0.181
RI-SOS-ωPBEPP86/aug-cc-pvdz//RI-SOS-ωPBEPP86/aug-cc-pvtz

Thursday 6^th October¶

I’ve mostly been cleaning up the calculations from yesterday in the table above, doing a bit of PySCF reading and “play” for trying to figure out optimised CASSCF optimisation procedures (density fitting, frozen core, natural orbs, etc.)

The results from yesterday’s test were interesting. The double hybrids seem to be performing really well, and RI-DLPNO-STEOM-CCSD and NEVPT2 performed surprisingly poorly.
The surprise to me was that the CASSCF optimised BODIPY energy performed really well, so I’m re-testing the other methods as optimised by that. It might be the best way to try to include multireference character into my species.
- This made my errors worse… I’m now trying a numerical optimisation at the double-hybrid level

I’m helping a friend at RMIT with their research at 4, so for now I’m plonked in the city and am going to continue trying to get some reading done.

So it looks like you can tell whether you need multireference character by looking at the $T_1$ amplitudes of a CC calc. If $T_1>0.02$, then you need a MR method. (10.1002/qua.560360824)

Monday 10^th October¶

Findings from my little correlated methods test:

I think that the exact QM solution for the BODIPY species is probably higher than the experimental (~2.66 eV). I guess this is the energy that we want to compare to, but I’m not sure what the experimental discrepancy is. I might need to test charge/multiplicity variations.
- In looking deeper, it looks like this method compares the 0-0 transition energy? at least, that’s what the SI claims. TBH though, this paper does not look like they understand the photophysics particularly well, as they’re calculating the 0-0 energy as $\Delta E_{adiabatic}+\Delta E_{ZPVE}$, and 0-0 is distinctly non-adiabatic… Time to do a 0-0 calculation, I guess?
  - Ref: $\Delta E_{0-0} =2.534\:eV$

With that in mind, it looks like DLPNO-STEOM-CCSD is just too unreliable a method to use, CASSCF//CASCI seems to be pretty good, but has no dynamic correlation, so is probably a fluke.

EOM-CCSD looks good, but is way too expensive with those big ERI matrices, even with density fitting. The double hybrid methods do look like a decent compromise, especially if we optimise numerically. with SOS-ωPBEPP8/aug-cc-pvtz is way too slow for this, SOS-PBE-QIDH (which is the best functional in benchmarks) seems fast enough to get away with this approach.

Method	$\Delta E_{S^0}$	$\Delta E^{ZPVE}_{S^0}$	$\Delta E_{S^1}$	$\Delta E^{ZPVE}_{S^1}$	$\Delta E_{0-0}$	$\Delta E ^{adia}_{S^1\to s^0}$
RI-SOS-PBE-QIDH/aug-cc-pvdz//RI-SOS-PBE-QIDH/aug-cc-pvtz	-680.485533	0.154578
RI-SOS-ωPBEPP8/aug-cc-pvdz//RI-SOS-ωPBEPP8/aug-cc-pvtz	-680.768591	0.153766

I’ve also decided to check the CASSCF/CASCI results for Oxazine $4^+$, since I have gas phase data for that as well. ($\lambda_{max}^{emission}$ at 2.160 and 2.094 eV at RT)

Method	Emission (eV)	Error (eV)
SS-CASSCF(6,6)/aug-cc-pvdz//CASCI(6,6)/aug-cc-pvtz	3.290	BAD!

Since every paper seems to find CASPT2/NEVPT2 opts as being really good, I’m going to try and NEVPT2(8,8) numerical opt for BODIPY. I looked at the CASSCF results from before and saw occupations outside the 0.05-0.95 range on the two most extreme orbitals, so I’ve culled my active space to (8,8)

Method	Emission (eV)	Error (eV)
RI-NEVPT2(8,8)/aug-cc-pvdz//RI-NEVPT2(8,8)/aug-cc-pvdz	3.017	BAD!

Tuesday 11^th October¶

Now that the Oxazine 4+ CASSCF job has optimised, it looks like we could probably get away with a (4,4) active space.

I’ve mostly been reading today, and I got some headings together for my IPR to see what things I need to cover that I haven’t read up enough on.

Ways to make Adrea not trust your whole paper:

“We finally recommend to use the linear response LR-CPCM solvation model as it produce a slight improvement in terms of consistency and predictability over gas phase calculations”

(No shit a PCM makes your theoretical calculations better match your solvated energies!? Who’d have thought…)

I ended up revisiting the double hybrid paper as I realised that they only benchmarked absorbance and not emission, which kinda puts a spanner in the works for that as our reference method.I suspect we might need to do some benchmarking work to assess the viability of the methods, perhaps using the TD-TDT geometry assessment 0-0 transition energies or even better 10.1039/C2CP42694C as our reference.

So the Oxazine 4+ CASSCF test was a failure… I guess it was just a fluke with the BODIPY species.

Friday 14^th October¶

I have been doing work, but it’s also been intermittent and boring. Today I’ve been trying to work through the VEM paper to get my head around the method and then I remembered a few things that need clarification in my head:

cLR = state specific LR
VEM = PTES (PTE + PTD)
IBSF = PTED
GSRF = PTD
LR is not state specific, cLR adds this this as per^
the LR method uses the transition dipole moment, where the cLR formalism comes from a solution of the Schrodinger eqn directly and uses the density differerence instead.
Equilibrium means no fast-slow separation, so if you’re optimising geometry, it should always be equilibrium

Okay so this all gets even dirtier in my mind, because It looks like in basic LR, equilibrium and non-equilibirum are MUCH more simple concepts. In doing some testing, there doesn’t seem to be any change with the energy of the states when the “state of interest” is changed, which implies that all states are treated equally.

In basic-LR, equilibrium does reduce the transition energy, but it does so to all states, and doesn’t change their relative spacing to each other. I’ve also been playing with IBSF(PTED) and cLR in Gaussian, which are both state specific approaches, but while cLR and both seem to change the energy of states relative to each other!

Useful quote from the Herbert review:

“Though it is found that the accuracy is largely unaffected if the LR-PCM procedure is used for the excited-state geometry optimization, followed by a ptSS-PCM single-point calculation, which simplifies the procedure.”

Sunday 16^th October¶

Code snippet for a very multilinear regression in python with statsmodels using the simple formula $\langle y\rangle=x_a\cdot m_a+x_b\cdot m_b+x_c\cdot m_c + c$. Here:

$y=$ Sales
$x_a=$ Radio
$x_b=$ Newspaper
$x_c=$ TV
$c=y$ -intercept

import pandas as pd
import statsmodels.api as sm

df = pd.read_csv('~/Downloads/Advertising.csv') # read in the data


x = df[['Radio', 'Newspaper', 'TV']] # assign our independent variables
y = df['Sales'] # assign our dependent variable

x = sm.add_constant(x) # set the model's x and y variables 
model = sm.OLS(y,x).fit() # and fit the data
predictions = model.predict(x) # extract the expected values for each entry according to our model
print(model.params) # print out the coefficients for each independent variable and our y intercept 
model.summary() # print a summary of the model

Could also be done with scikit learn, though it seems to be less fully featured?:

from sklearn import linear_model

regr = linear_model.LinearRegression() # defines our model
regr.fit(x,y) #fits our model
print(regr.coef_) # prints the coefficients
print(regr.intercept_) # prints our y intercept

This could also be accomplished with scipy, defining a custom function, as shown here with morse potential fitting:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def model(x, a, b, c, d, e):
    return (a * x**4) + (b * x**3) + (c * x**2) + (d * x) + e


params, covariance = curve_fit(model, x, y)
print(params)

x_new = np.arange(-10, 20000, 1)
y_new = model(x_new, *params)

plt.scatter(x, y)
plt.plot(x_new, y_new, c='r')
plt.show()

In the above example, scipy optimize fits as a 4^th order polynomial to a series of data, with coefficients a though c

Monday 17^th October¶

New Idea!

What if we scale the solvent parameters in SMD to better match the excited state!?
Solvents are described through 7 parameters, 5 of which control the non-electrostatic terms, so ignoring the issues with the baked in parameters (e.g. interatomic distances) we could theoretically re-parameterise the solvent properties alone, though this is INCREDIBLY hacky since the SMD model is derived from stat-mech…
Perhaps we can just apply a correction term to SMD, though this would still have to be implemented at a deeper level, and I’m not sure if I want this PhD to be a code integration challenge.

Tuesday 18^th October¶

New Idea! (cont.)

The new idea could also be extended to SMSSP solvation, since it’s a more simple model with only two parameters

I’ve been playing with NWchem since it both has source code for SMD solvation (in case I want to modify it) and it has SMSSP and VEM. It looks like the SMSSP correction for the BODIPY species I’m playing with is minor to say the least,

SMSSP seems like a cool method, but its applicability to excited states is mostly based on it using excited state polarisability, which you have to calculate beforehand. It very much looks to be used in conjunction with SMD radii, which means that in total you need:

Polarisability of the ground and excited state of interest
Solvent dielectric, h-bonding acidity and macroscopic surface tension

0-0 Transition Correction¶

I probably should have picked up on this from another paper, but a correction needs to be applied when comparing with experimental 0-0 transitions. This seems to be because the 0-0 transition is an equilibrium response, where the crossing point of our absorbance/fluorescence spectra (our experimental 0-0 transition) is a non-equilibirum response:

\[ \begin{align*} E^{0-0}&=E^{0-0}(cLR,eq)+\frac{[\Delta E^{vert-a}+\Delta E^{vert-f}]}{2}\\ \Delta E^{vert-a}&=\Delta E^{vert-a}(cLR,neq)-\Delta E^{vert-a}(cLR,eq)\\ \Delta E^{vert-f}&=\Delta E^{vert-f}(cLR,neq)-\Delta E^{vert-a}(cLR,eq) \end{align*} \]

This is discussed fully in (10.1021/ct300326f)

Wednesday 19^th October¶

It’s the WINC AGM dinner tonight, so I’m trying to work form home, which means I’m also trying to paint my nails while catching up on seminar recordings that I’ve neglected to watch for months.

Thursday 20^th October¶

Talk to Baden about doing IR as well, Talk to Toby about making a dataset.

Katya is suggesting a choice of 10 fluorophores in 10 solvents

Maybe 5 distinct structure with a few variations of each

Using 0-0 transitions from absorbance/fluorescence crossings
For error checking, only using the smallest systems for double hybrids
- Try using a dz basis for these
Might have to use the multi-functional averaging approach
Katya has no papers that are critical of the field
To assess MR character, use CASSCF and see if changing the active space changes the energy
They have to have to done a peer reviewed component
- Any other research counts if it’s peer reviewed

Friday-Sunday 21^st-23^rd October¶

I’ve been putting a lot of effort into getting nwchem compiled and profiled to figure out what combination of MPI, BLAS/LAPACK/SCALAPACK, compilers will be best for MonARCH

In profiling, MPI-PR seems to be twice as fast as using OPENIB (Inifiband verbs)

Intel’s 2018 stack seems marginally slower than Intel’s 2022 stack, but has a lower memory footprint (though this could also be due to how memory was specified)

TD-CAM-B3LYP/aug-cc-pvdz with SMD + VEM + SMSSP gradient on `md05` :¶

All seem to be above 95% CPU efficient

Compiler	MPI	MPI Protocol	BLAS/LAPACK	SCALAPACK	Threads	Time (min)	Max mem (GB)
Intel 2022	Intel 2019	MPI-PR	MKL 2022	MKL 2022	17	85	5.36
Intel 2022	Intel 2019	OPENIB	MKL 2022	MKL 2022	16	205	5.64
Intel 2022	Intel 2019	MPI-PT	MKL 2022	MKL 2022	16	very slow
Intel 2022	Intel 2019	MPI-MT	MKL 2022	MKL 2022	16	very slow

OpenMPI OPENIB with GNU compilers seems to give the best multicore performance, but I’m still figuring out MKL/SCALAPACK.

TD-CAM-B3LYP/aug-cc-pvdz with SMD + VEM + SMSSP gradient on `md01` :¶

Compiler	MPI	MPI Protocol	BLAS/LAPACK	SCALAPACK	Threads	Time (min)	Max mem (GB)
Intel 2022	Intel 2019	MPI-PR	MKL 2022	MKL 2022	17	83	5.45
GNU 10.2	OpenMPI 4.1.2	MPI-PR	MKL 2018	SCALAPACK 2.0.2	16	75	2.78
GNU 10.2	OpenMPI 4.1.2	OPENIB	MKL 2018	SCALAPACK 2.0.2	16	69	2.80
GNU 11.2	OpenMPI 4.1.2	MPI-PR	MKL 2022	SCALAPACK 2.0.2	16	78	2.72
GNU 11.2	OpenMPI 4.1.2	OPENIB	MKL 2022	SCALAPACK 2.0.2	16	69	2.97
GNU 11.2	OpenMPI 4.1.2	OPENIB	MKL 2022	Inbuilt (compiled)	16	73	3.09
GNU 11.2	OpenMPI 4.1.2	OPENIB	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.3 (WTF!)	1.85
Conda					16	36	2.55

Other comp notes:¶

Intel link lined figured out with their link line advisor
GNU 5.* stack will not build (tried for cross compatible OpenMPI)
Have not tried compiling OpenMPI directly
MPI-PR needs one extra thread for managing the worker threads, so need to request n+1 cores
Intel MPI doesn’t work with /usr/bin/time -v
Compile takes ~30-35 mins with GNU stack, but is slower with intel

Monday 24^th October¶

AAAAAAH crap. I just did a quick test against the conda-forge version of nwchem, and it destroys all of my locally compiled versions. I think I might need to try compiling against all the conda dependencies, and see how that goes.

In digging into the conda-forge build scripts, it looks like all the BLAS is using USE_64TO32=y, and no ARMCI interface (which I think defaults to MPI_TS) and no OpenMPI so I might try that…

Okay… so uhhh… The inbuilt OpenBLAS/SCALAPACK gave me a 90 second run!!!! WTFFFFF!! (This is replicatable)

The scaling of this (wrt #cores) is also pretty good!

scaling

Okay, with that exercise done, I guess I should get back to reading…

So I noticed that the github latest version of nwchem has LibXC functionals, so I compiled that, and it’s even faster, which means wB97X-D3 in nwchem .

LibXC functionals also seem to be compatible with TDDFT gradients, but the VEM component is still gradient only.

A new and useful table:¶

	ORCA	Psi4	PySCF	GAMESS	Gaussian	QChem	NWChem
PCM	A-CPCM	A-CPCM A-IEFPCM	A-ddCOSMO	A-CPCM A-IEFPCM A-SS(V)PE	A-CPCM A-IEFPCM	A -CPCM A-IEFPCM	A-COSMO

ES Treatment
LR-PCM	A	N	A	N	A	A	A
ptLR (transition density)						ptLR-A (LR-PTE(S?))
cLR/PTE/ptSS (density difference)	CCSD-PTE-E				cLR(ptSS)-A	ptSS-A PTE-ADC-A (no iterations)	?
PTES/VEM	CCSD-PTES-E					PTES-ADC-A (1+iteration)	VEM-E
PTED/IBSF					CCSD-PTED-E IBSF-E (ExternalIteration)	PTED-ADC-A
SMSSP							E

Non-el Terms
SMD	A-(g/e)			y-(?)	A-(?)	A-(g)	A-(g/e)
SMSSP							E

Explicit Embedding
EFP		A-(solute)		A		E-(+Excited States)
Density Embedding		CPPE	CPPE			CPPE Frozen Projection Based (polarisable ONIOM)

Tuesday 25^th October¶

SS uses $\Delta\rho_k=\rho_k-\rho_0$, LR uses $\rho_{0n}$

The version of nwchem I compiled yesterday has some issues with lots of failing tests, so I’m trying a few different variants of the compile

Version	Compiler	MPI	MPI Protocol	BLAS/LAPACK	SCALAPACK	Threads	Time (min)	Max mem (GB)	Stability
7.0.2	GNU 11.2	OpenMPI 4.1.2	OPENIB	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.4	2.01	Stable
7.0.2 (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	OPENIB	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.4	2.0	Stable

Latest	GNU 11.2	OpenMPI 4.1.2	OPENIB	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.3	2,35
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	OPENIB	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.1	2.04	Unstable
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	MPI-PR	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	2.0	1.83	Stable
“	“	“	“	“	“	17 –oversubscribe	1.78	1.96	Stable
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	MPI-PT	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	MPI-MT	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	4.1	1.5
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	MPI-TS	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	2.9	2.16
Latest (No OpenMP)	GNU 11.2	OpenMPI 4.1.2	MPI-SPAWN	Inbuilt OpenBLAS (Compiled)	Inbuilt (compiled)	16	1.0	2.05	Unstable

Fun definition; A dark state is one in which the molecule cannot absorb or emit photons, as in, the transitions to all other states are forbidden

Thursday 27^th October¶

I know that I should just start writing my IPR, but I’ve gotten REALLY distracted using pytorch to do multilinear regression of a really large dataset… Tell me this isn’t cool to watch:

This is a basic multilinear regression taking the form. but it’s being performed on the WHO life expectancy dataset:

\[ A x^{(i)} + b = \begin{bmatrix} a_1, a_2, \cdots, a_j, \cdots, a_n \end{bmatrix} \begin{bmatrix} x^{(i)}_1 \\ x^{(i)}_2 \\ \vdots \\ x^{(i)}_j \\ \vdots \\ x^{(i)}_n \end{bmatrix} + b = a_1 x^{(i)}_1 + a_2 x^{(i)}_2 + \cdots + a_j x^{(i)}_j + \cdots + a_n x^{(i)}_n + b = y'^{(i)} \]

It’s very cool, but still MASSIVELY faster to do it using scikit.learn or statsmodels, and fitting anything non-linear is also far more sensitive in pytorch. In a message to a friend:

“Well, I’ve officially decided that doing multilinear regression within an ML framework is like carving a pumpkin with a Swiss Army knife

The knife may be absolutely beautiful and very versatile, but it’s going to do physical damage to you, and it’s going to make a mess of the pumpkin, assuming you have the skill not to end up with pumpkin fragments everywhere. Oh, and it takes 20x-50x as long.”

I am very tired but trying to pick apart NWChem’s VEM cycle

SCF + COSMO PCM + Print MO analysis
TDDFT + COSMO PCM + Print roots and oscillators
TDDFT gradient + COSMO CM
COSMO VEM Results → cGSRF energy
Another SCF cycle (short one this time) + Print MO analysis
TDDFT COSMO PCM + Print roots and oscillators
TDDFT gradient + COSMO CM
COSMO VEM Results → VEM iteration energy (cLR)
Repeat from 5 for full VEM energy

Between VEM in eq and VEM in non-eq, the TDDFT excitation energy is the same. After the gradient module kicks in, the energy is different, which implies some fast/slow separation going on there.

With pure COSMO, no corrections are being made to the TDDFT excitations, so I’m guessing this is using basic LR formalism.

I guess this means that the second VEM output is the cLR equivalent, which seems to be pretty close to what gaussian is producing.

Comparing the VEM paper’s description to this description:

VEM Paper	Output steps
First, we evaluate the equilibrium ground-state reaction field and the slow (orientational or inertial) polarization charges using the procedure described above.	1 → 2
Then, we evaluate the excited-state wave function with the effective Hamiltonian of eqn (4) using the ground-state reaction field as an initial approximation to the excited-state reaction field, which results in the zero-order approximation b(0) to the solute’s excited-state electrostatic potential b.	3 → 4
Using the excited-state column vector b(0) and the fixed slow polarization charges saved earlier, and solving eqn (18) if Partition I is used or eqn (19) if Partition II is used, we can evaluate a new excited-state reaction field using eqn (13) or eqn (14).	5 → 6
Using the new total nonequilibrium excited-state reaction field, we evaluate new approximations to the excited-state electronic wave function of the solute molecule in solution and to the column vector b…	7 → 8
until the procedure converges self-consistently with respect to these quantities.	9 →

Friday 28^th October¶

It’s going to be a very long and busy day today, with a 4pm meeting with Toby to wrap it all up. I need to talk to him about getting started on building my dataset.

What a fantastically productive meeting with Toby. We went through the lab and found a bunch of compounds that Toby already has and identified a set of solvents to use/test

Fluorophore	Class	Other notes	Price
Rhodamine 6G	Rhodamine
Rhodamine 123(?)	Rhodamine
AlexaFluor 532	Rhodamine-like	In azide form
NDI (of some description)	NDI
Naphthalamide (of some description)	NDA
Prodan/ANAP/Laurdan	Prodan		Prodan $245/25mg l-ANAP $114/1mg Laurdan $760/100mg
DAPI	DAPI	“Someone in bio will have some”	$128/10mg
FITC	Fluorescein	Can we just use fluorescein?	$66 USD/100mg
Coumarin 343/519	Coumarin
Texas Red	Rhodamine-like	“Alison will have some”	$256/5mg
Nile Red	Oxazine
BODIPY 493/503	BODIPY	Toby wants this to be investigated	$467/500mg
Merocyanine 540	Cyanine
Dansyl	Naphthalene	Which dansyl?	Hydrazine $68/250mg Amide $150/1g Chloride $160/1g
Azulene	Azulene	$S_2\to S_0$ emitter	$55/50mg

To Ask Toby:

Fluorophore	Class	Other notes	Price
Indigo Carmine			$87/25g
Fluorescein	Fluorescein	Cheaper than FITC	$50/250mg
Cascade Blue	Pyrene		$208/1g

Solvent	$\varepsilon_0$	$n$	$\alpha$	$\beta$	$\gamma\:(mN/m)$ $293.15K$	$\gamma\:(cal\cdot mol^{-1})$ $(mN/m\times 1.439)$	$\phi$	$\psi$	Price
Toluene	2.37	1.4969	0.00	0.14	27.94	40.21	0.86	0.00
Anisole	4.22	1.517	0.00	0.29	35.37	50.90	0.75	0.00
Diethyl ether	4.24	1.3473	0.00	0.41	17.15	24.68	0.00	0.00
$\ce{CH3Cl}$	4.71	1.4458	0.15	0.02	27.10	39.00	0.00	0.50
THF	7.43	1.4072	0.00	0.48	27.37	39.39	0.00	0.00	$150-240/L
DCM	8.93	1.4241	0.10	0.05	27.84	40.06	0.00	0.67
Long chain alcohol (octanol)	9.86	1.430	0.37	0.48	26.02	37.44	0.00	0.00	$100-190/L
EtOH	24.85	1.3614	0.37	0.48	22.28	32.06	0.00	0.00
ACN	35.69	1.3441	0.07	0.32	28.37	40.82	0.00	0.00
DMF	37.22	1.4305	0.00	0.74	36.73	52.85	0.00	0.00
DMSO	46.83	1.4793	0.00	0.88	43.36	62.40	0.00	0.00

There’s no point in doing water, since SMD uses its own parameters for it and apart from nile red, nothing will fluoresce in it.

Sunday 30^th October¶

What a day… I wasn’t going to do any work, but I got lost in coding when I realised that I could use python Enums to build objects for each fluorophore, solvent,, basis set and method, since it makes EVERYTHING so much easier. I don’t have to make LUTs and pass through a truckload of variables for each operation I do if I can just pass through 4 objects.

Using the aenum package made this even better, since it means that I can nest Enums within Enums to build classes REALLY easily that mean that I can essentially just build thee objects from an input table, and have sub-types for things like what family a basis set is from, or what rank on the ladder a specific functional is.

Papers¶

Studied in other papers, double hybrids (+MP2 correlation energy) reduce the errors by half.

Potential research questions:¶

Is there a custom combination of PCM components that can be identified with ML to build the optimal model?
Quantifying non-electrostatic contributions to vertical transitions
Can we generalise non-electrostatic terms from other models into our own?
- Perhaps taking the SM dataset and re-parameterising for a different model?
- Seems to be valid to do this with isodensity cavities, as according to the Herbet review (in the conclusion) this is seemingly the best way to get electrostatic terms that are separable from the non-electrostatic ones

Is there a gold standard solvation method that can be used?
What is the PCM specific error contribution when looking at excited states? EOM-CCSD (gas) → TDDFT (gas) → TDDFT (solv)
No Double hybrid 0-0 transition benchmark study. Do we need top do this to know our errors before starting?
- I think we should at least for our reference functional, so we can understand what sort of errors we’re dealing with.

Issues with the research¶

No high level optimised geometry. Everything is compared to TD-DFT or MP2 opt, with benchmark levels of theory for SP only. Due to the cost, everything seems to be either at relatively low levels of correlated theory, or using a dataset of really small molecules.
- https://arxiv.org/abs/1402.3008 is a good comparison of correlated methods, however didn’t use augmentation.
No correlation between high level gas phase calculations and experiment.
- This is even more difficult since the methods needed to get the fluorophores into the gas phase use ionisation, and thus cannot be directly compared with the solvated species.
0-0 transition energy benchmarks exist for highly correlated WF approaches and TD-DFT, but not so much for double hybrids.
“Chemical accuracy” for gas phase calculations can be obtained statistically, but the errors are very scattered in general and are very system specific.
What is the scope of my studies?
- Do I only care about fluorescence, neglecting absorbance?
  - Only singlet transitions?
- Do I only focus on top ranked TD-DFT methods and ignore all ab initio ones?
- Do I only focus on organic fluorophores? Metals transitions? (I guess then I’d need multireference methods)
From Katya
- Keep is simple for the time being
  - Probably organic fluorophores
  - Maybe metals as a “future direction”
- Don’t focus too much on the technical descriptions
  - Comp chem is useful as a high throughput screening tool, but atm, PCMs for excited states are not that. the errors are not systematic.
- Using double hybrid methods as reference data.
  - As Mennucci did in Reading Band Shapes (10.1021/ct4000795), we can treat the functional error as systematic
- Multilinear regression before ML
- Formation of generalisable method to calculate non-electrostatic components
- We do care about about absorbance because it’s practically free…
- Focus on LR-PCM for now (maybe cLR).
  - Let’s see if simple things can be fixed before we delve too deep

Important (play) coding:¶

Write a server-side script for reading files? (should be faster than catting over ssh)
- Potentially explore sftp file reads instead of ssh catting
QM/MM in Q-Chem
Can the isodensity be used to generate a cavity made of atom-centered spheres for backwards compatibility?
~~QM solvation shell + EFP for long range~~
- I’ve decided that this is ultimately a bad idea, as it defeats the purpose of the non configuration-specific nature of PCM. Perhaps EFP fragments within a PCM region? that could be kinda cool! LEt’s Do that!

Method	\(\Delta E_{S^0}\)	\(\Delta E^{ZPVE}_{S^0}\)	\(\Delta E_{S^1}\)	\(\Delta E^{ZPVE}_{S^1}\)	\(\Delta E_{0-0}\)	\(\Delta E ^{adia}_{S^1\to s^0}\)
RI-SOS-PBE-QIDH/aug-cc-pvdz//RI-SOS-PBE-QIDH/aug-cc-pvtz	-680.485533	0.154578
RI-SOS-ωPBEPP8/aug-cc-pvdz//RI-SOS-ωPBEPP8/aug-cc-pvtz	-680.768591	0.153766

October 2022¶

Monday 3rd October¶

Tuesday 4th October¶

Wednesday 5th October¶

Thursday 6th October¶

Monday 10th October¶

Tuesday 11th October¶

Friday 14th October¶

Sunday 16th October¶

Monday 17th October¶

Tuesday 18th October¶