Problematic Polymorphs
I recently found a blog post from regular Chemistry World contributor Derek Lowe, highlighting an Early View Angewandte Chemie communication (doi: 10.1002/anie.201406886) in which the authors determined the crystal structures of two new polymorphs of the amino acid L-Phenylalanine. The paper also helps to clarify the relationship between several other Phenylalanine structures published over the last 20 years. Although Derek was surprised that determining the structure of a seemingly simple molecule had proved such a challenge for small-molecule crystallography, this type of challenge is not unusual. A notable example is the case of the two polymorphs of D-Ribose which evaded full determination for over 50 years (see ZZZFEE in the CSD from 1956!) until Jack Dunitz and co-workers published an article triumphantly exclaiming “The Crystal Structure of D-Ribose—At Last!” in 2010 (doi: 10.1002/anie.201001266).
The challenges involved in obtaining good quality single crystals to determine a structure should not be underestimated. Prior to the findings of this latest paper, the Cambridge Structural Database (CSD) contained five determinations of the structure of L-Phenylalanine (QQQAUJ-QQQAUJ04), from four different groups of researchers, all proposing different polymorphic forms based on the crystal structure data that they obtained.
The two polymorphs of L-Phenylalanine reported – LHS shows Form I (established as Z’ = 4 in P21) and RHS Form IV
In my opinion, this paper highlights the chemical and crystallographic expertise that is required to understand crystal structures fully, even now when data collection and structure solution are routine procedures. Thankfully, not all of the tens of thousands of crystal structures added to the CSD every year are this problematic, but the paper does underline the need for us at the Cambridge Crystallographic Data Centre (CCDC) to expertly maintain the CSD and provide tools to help our users evaluate the data we receive. The authors of this paper state there are now four well-characterized polymorphs of L-Phenylalanine and call into question the structure of DL-Phenylalanine as a true racemate (suggesting instead a twinned conglomerate).
The CSD categorises polymorphs in several ways. The most familiar to many may be the refcode family; all determinations of a compound are grouped together, including those at different temperatures, pressures, polymorphs etc. (L-Phenylalanine can be found in QQQAUJ). In such cases, we rely on the interpretation of the scientists who report the compounds, and later developments may require us to modify these decisions. The CCDC’s ConQuest also provides ‘best representative’ lists, to analyse structures programmatically to determine a single best example of each unique polymorph (for further information on the best representative lists see doi: 10.1107/S0108768106019677). Researchers at the CCDC are also providing tools to help scientists evaluate the stability of their compounds and the likelihood of other polymorphic forms being present.