I put my extra line on the exact same spot as you. Your code attached is completely right.
The reason why mine took such a long time was that I not only did the calculation for the 2*2*2 supercell but also did it for the 3*3*3 supercell. I need the max3/max2 ratio to find out the dimensionality of the structure. I am sorry that I did not make it clear at first. What I did was that I wrote a little function called dimension_test which covers from expansion to finding max atoms using shell.components (please see the attached code for more details). I set the upper limit for the atoms_not_of_interest list to be a parameter called 'threshold' and I called dimension_test in the main function so that I could run it twice to calculate max_atoms in 2*2*2 supercell and max_atoms in 3*3*3 supercell. I timed each sections of my code and it was the to_be_bonded of 3*3*3 supercell that took the most time: 1796 seconds for VOLMAL and 1766 seconds for IXASUW. To_be_bonded of 2*2*2 supercell was: 144 seconds for VOLMAL and 149.72 seconds for IXASUW.
I have tried to put the to_be_bonded line and add_bond line under the "for mol in expansion" loop, hoping that the running time will decrease if we find to_be_bonded on the smaller scale. However, this change didn't seem to result in a shorter running time (I have tested the influence of the change on running time on VOLMAL).
I don't understand whether there will be such a big difference on running time if there is none in the fractional coordinates. I have ran the code on a lot of vanadium tellurites structures (32 in total): except for IXASUW and VOLMAL, others' running time range from 15 seconds to 4 mins. Is it still possible to cut the running time of the to_be_bonded line for the structures that have atoms with no fractional coordinates?
An idea that I have so far is that we can pre-process the CIFs, delete the lines of atoms which have no fractional coordinates and read in the revised CIFs into the code to avoid a long running time.
Many thanks!! Really appreciate your help over the past :)
Thanks a lot! Your fragment definitely cuts the running time and it takes fewer than 5 mins to run one structure.
Only 2 structures are acting weird: VOLMAL and IXASUW. "none" will be printed out every time I call mol.atoms.fractional_coordinates. I have looked into their CIFs and there were "?" marks besides some atoms. Therefore, I added one line to resolve this issue:
mol.remove_atoms( a for a in mol.atoms if a.fractional_coordinates is None )
The code was finally able to run on those two structures instead of encountering "nonetype object has no attribute" type issue. But the running time was quite long: each of those two structures takes about half an hour to get an result. At first, I wonder whether this has to do with remove_atoms() reshuffling atom index after every removal, an issue that you mention in the last post. So, I tried to sort mol.atoms based on its index:
mol.atoms.sort(key=lambda x: x.index, reverse=True)
Still, sorting based on atom index doesn't help. Do you have any idea of what's going on here? Is there any possibility that we can shorten the running time on those two structures?
No worries and thanks for keeping track! I have revised the code so that now it seems to be able to extract right values. The revision that I did is : after occupying the cell and expanding the cell by using symmop, I delete atoms that are out of the 2*2*2 supercell and out of the 4*4*4 supercell by using atoms' fractional coordinates before proceeding to the next step. What I also did was that I make the range of i, j, k in cif.symmetric_molecule(symmop,(i,j,k)) large: "for i in range(0,5) for j in range(0,5) for k in range(0,5) " so that I can make sure all atoms in the 2*2*2 supercell and 4*4*4 supercell can be generated after applying symmetric_molecules method. I am able to visualize the outcomes of my code in mercury and they seem right to me.
The only problem that I have is that the code takes a long time to run: about 2 hours per structure. I wonder whether it's the problem with my algorithm or it is the problem with the old-fashioned computer where I install the CSD package. The cif.symmetric_molecule(symbol, (i,j,k)) for i in range(0,5) for j in range(0,5) for k in range(0,5) seems to take a long time. Some examples of the structures that need a long time to run are:LAQNOH, QIFPAY, CAPBEB.
Here is my code:
for symmop in cif.symmetry_operators
for i in range(0,5)
for j in range(0,5)
for k in range(0,5)
for mol in expansions:
for a in shell.atoms:
[a for l in unique_dict.values() for a in l[1:]]
a.fractional_coordinates for a in shell.atoms
a for a in shell.atoms
if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
shell.remove_atoms(i for i in atoms_not_of_interest)
shell.add_bonds(('Single',a,b) for a,b in to_be_bonded_1)
shell.add_bonds(('Single',a,b) for a,b in to_be_bonded_2)
max_atoms_cell=max(len(c.atoms) for c in shell.components)
min_atoms_cell=min(len(c.atoms) for c in shell.components)
print "max_unit_cell is", max_atoms_cell
print "min_unit_cell is",min_atoms_cell
(a,c) for a in shell.atoms for c in shell.atoms
and a not in c.neighbours
print descriptors('REF CODE').dimension_test()
Thanks so much and have a great weekend!
Thanks for the responses!
May I ask a follow-up question regarding this topic?
After making all the revisions based on your replies, my code can run successfully and is able to extract the minimum and maximum number of atoms in the covalently bonded clusters for both the original unit cell (min1, max1) and the 2*2*2 supercell (min2,max2). However, the extracted number is quite weird. For example, for structure YOTHUM, min1=max1=min2=max2=18. However, YOTHUM is a 1D chain structure. From the original unit cell to the 2*2*2 supercell, max2/max1 should be 2 instead of 1.
Another example is AXOHUR, a vanadium tellurite structure : min1=18 max1=25 min2=128 max2=181 max2/max1=7.24. With AXOHUR being a 2D layer, max2/max1 should equal to 4 while the code prints out 7.24.
One more example is WEMBOG01: min1=max1=22 min2=max2=40 max2/max1=1.82 . However, WEMBOG01 is a 2D layer, max2/max1 should equal to 4 instead of 1.82.
There are quite a lot of structures that have incorrect min1/max1/min2/max2 number. I opened the cif in VESTA to get a rough idea of what max2/max1 ratio is when I expanded the unit cell into a 2*2*2 supercell. What I got from VESTA was inconsistent with the numbers extracted. I have been looking at my code for a couple of days but ended up with little ideas on how to revise it. Therefore, I post again, asking for help. I wonder whether the failure of the extraction has something to do with Molecule.components because the code prior to this method seems quite correct to me. Or maybe we miss some important steps?
I am very grateful for your help in the past. Will appreciate your advices/suggestions!! :) Feel free to ask me more if I didn't state my question clearly.
Have a great day,
Thanks so much for your replies! Your inputs are really helpful for my research. My research has been about coding in python with API to achieve an efficient extraction of chemical descriptors of interest, which include dimensionality, atom min distance, coordination number and so on. Chemical descriptors can then be mapped to crystals' structures.
In addition, when I tried to run the code on AXOHOL and other crystal structures (i.e. IKEOLA, YOTHUM, VOLMAL, VOLMEP [all of which are vanadium teullrites]), I was getting a error message of "add(HBondType, HAtomType, HAtomType) atom1 and atom2 are already in a different bond in this ChemicalGraph object" on the line "shell.add_bonds(('Single',a,c) for a,c in to_be_bonded_2)". I wonder what I can do on the code to avoid this. Thanks!
Thanks for replying! Your code and comments are all super helpful. However, I still have a couple of questions for you:
1. For the purpose of importing only atoms of interest (in my case, it will be 'O', 'Te', 'V'), I modified your code but not sure whether it is right way to do so:
(a,b) for a in shell.atoms if 'O' in a.label and for b in shell.atoms if 'V' in b.label
if a.index<b.index and a not in b.neighbours and MD.atom_distance(a,b)<=2.6
(a,c) for a in shell.atoms if 'O' in a.label and for b in shell.atoms if 'Te' in c.label
if a.index<c.index and a not in c.neighbours and MD.atom_distance(a,c) <=2.8
#I used 2.6 for V-O bond and 2.8 for Te-O bond for my structures of interest; other parts of the code remain the same
I wonder whether modifying the code in this way can achieve a correct import of atoms of interest. Or should I do it some other ways.
2. I am having a small trouble implementing the last line of your code---writer.write(shell). I checked the API documentation and there were a variety of writers. Are you referring to the MoleculeWriter.write( ) method? Please correct me if I interpret it the wrong way.
3. I am also interested in calculating the number of atoms in the covalently bonded clusters in the filled unit cell. (by 'filled', I mean that symmetry operators have been applied to the unit cell). I wonder whether I can just modify the 2*2*2 supercell code slightly to do the calculation:
changing: expansions = [
for symmop in axohol.symmetry_operators
for i in range(0,2)
for j in range(0,2)
for k in range(0,2)
into: expansions = [
for symmop in axohol.symmetry_operators
for i in range(0,1)
for j in range(0,1)
for k in range(0,1)
while keeping the other parts of the code the same.
4. I am interested in the maximum number of atoms and the minimum number of atoms in the covalently bonded clusters in both the filled unit cell and the 2*2*2 supercell. My current way of doing it is to use the components method on the newly written shell (in my following code, it will be shell_unitcell and shell_supercell):
print "number of atoms in the unit cell is (max1==min1)", len(components_unitcell.atoms)
for i in range(len(components_unitcell)):
print "max1==min1 and the # of atoms in the clusters is", max(result_unitcell)
print max_unitcell, min_unitcell
#the code for extracting min# atoms and max# atoms in covalently bonded clusters in components_supercell will be similar.
I wonder whether using the components method on the written shell molecule will be a good way to extract the two parameters.
Thanks a lot! I highly appreciate for your replies!
Thanks for replying! I am glad that API has built-in methods for finding covalently bonded structures.
I am sorry that I did not define 'covalently bonded structures' clear enough when I posted the question. Strongly bonded pairs of atoms are identified if the interatomic distance of the pair is less than the sum of their reference atomic bond lengths plus a tolerance of 0.45 Å to account for bond length variability. The reference values of bond radii can be found on page 11 of the link: http://pubs.acs.org/doi/suppl/10.1021/acs.nanolett.6b05229/suppl_file/nl6b05229_si_001.pdf
Therefore, I wish to use a bespoken distance range to define my clusters. In your last post, I tried the method---components and it was great. I am not interested in the whole molecule of the imported crystal but only on the inorganic layers. For example, in my case, I am interested in vanadium tellurite structures and therefore I only want V, O, Te ----these 3 types of atoms to be imported when the components function is used.
In addition, I am indeed interested in building larger representations the molecule. I tried shell.add_molecule(crystal.symmetric_molecule(symbol)) method suggested by your post. However, I am little bit confused about the line: shell=Molecule('SHELL'). Molecule() doesn't seem to be defined on the API documentation. Also, is the line connected with crystal=csd.crystal('ABACUF') by any means? It'll be great if you can explain a bit about the line.
Also, I am interest in finding covalently bonded structures in 2*2*2 supercell. The code that I have written is:
for symmop in cif.crystal.symmetry_operators
for i in range(0,2)
for j in range(0,2)
for k in range(0,2)
However, expansions is a list and therefore the method---components does not seem to be able to apply on. Therefore, I wonder what method is available for finding covalently bonded clusters on supercell.
Thanks so much!
I currently work on identifying the dimensionality of crystal structures using python. The paper (http://pubs.acs.org/doi/full/10.1021/acs.nanolett.6b05229) introduces an algorithm that can screen and identify 1D, 2D, 3D structures. One of the important parameters needed for this algorithm is to find covalently bonded clusters in crystal structures. I already have the covalent bond length threshold needed to determine whether a pair of atoms is bonded or not for my structures of interest. I wonder whether there is a built-in function in CSD python API in helping to find covalently bonded clusters and identify the number of atoms in the clusters or even dimensionality if possible? Thanks a lot!
The molecular_shell( ) definitely helps and I have extracted the right min_distance using that function. Thank you very much for your help!
Thanks for replying!
I have considered using the symmetry operators of the crystals to calculate the distance between two atoms in the crystal before. I think the symmetry operators only allow populating the atoms within the unit cells. However, I am interesting in calculating and finding the minimum distance between any two atoms in the crystal structure. This requires applying translational symmetry for the central unit cell and creating a supercell by expanding the central unit cell in 3D.
Therefore, I wonder whether CCDC python API library has methods that will allow interatomic distance calculation across the unit cell boundary. Thanks a lot!