Host institution: Department of Chemistry and Biochemistry,
University of South Carolina, Columbia, SC, USA
Organizers: Michael L. Myrick
Lecturers: R. Patrick Xian (remote), Santosh Adhikari, Sourin Dey
Acknowledgements: Lee Hallman, Christopher A. Sutton
Day 1 – August 3rd
Lecture topics
Python basics, programming environment, software repositories
- the Anaconda distribution of Python (https://www.anaconda.com/products/distribution)
- “Hello world!” script
- introduction to Jupyter (https://jupyter.org/)
- repositories: GitHub (https://github.com/), PyPI (https://pypi.org/)
- core data structures (numeric, string, list, tuple, dictionary) and their usage
- composite data structures (named tuple, ordered dictionary)
- control flow
- errors and their handling
- basic file handling
Lecture materials
- Teaching notebooks: Python basics and Jupyter (optional material)
- Exercise notebooks: Exercise_01 and its solution
References
- How to Think Like a Computer Scientist
- Software carpentry Python fundamentals
- Awesome Python, curated materials
Day 2 – August 4th
Lecture topics
Functional programming and data visualization with Python.
Bring your own data (BYOD), if possible!
- functional programming in Python
- matplotlib (https://matplotlib.org/)
- seaborn (https://seaborn.pydata.org/)
Lecture materials
- Teaching notebooks: Functional programming and data visualization
- Exercise notebooks: Exercise_02 and its solution
References
- Python functional programming tutorial
- David Mertz, Functional Programming in Python, O’Reilly (2016)
- John D. Hunter, Matplotlib: A 2D graphics environment, Computing in Science and Engineering (2007)
- Nicolas R. Rougier, Scientific visualization book (2021)
- Matplotlib tutorial
- Michael L. Waskom et al., seaborn: statistical data visualization, Journal of Open Source Software (2021)
- Seaborn tutorial
Day 3 – August 5th
Lecture topics
Scientific and numeric Python software packages
- numpy (https://numpy.org/)
- pandas (https://pandas.pydata.org/)
- operator (https://docs.python.org/3/library/operator.html)
- scipy (https://scipy.org/)
Advanced programming in Python
- some useful built-in functions
- introduction to object-oriented programming (OOP) in Python (https://realpython.com/python3-object-oriented-programming/)
- layout of a Python program, including the import statements
Lecture materials
- Teaching notebooks: Advanced programming in Python, scientific Python, and tabular data processing
- Exercise notebooks: Exercise_03 and its solution
References
- Charles R. Harris et al., Array programming with NumPy, Nature (2020)
- Numpy tutorial
- Pauli Virtanen et al., SciPy 1.0: fundamental algorithms for scientific computing in Python, Nature Methods (2020)
- Wes McKenney, Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference (2009)
- Wes McKenney’s online book Pandas for Data Analysis O’Reilly (3ed 2022)
- ODSC pandas workshop tutorials
- Scipy tutorial by phoenixNAP
- Python operator module tutorial
- A somewhat pedantic intro to OOP in Python from Microsoft
- Advanced concepts for OOP in Python
- Brandon Rhodes, Python design patterns
Day 4 – August 8th
Lecture topics
Common machine learning frameworks in Python
- scikit-learn (https://scikit-learn.org/)
- pytorch (https://pytorch.org/)
Lecture materials
- Teaching notebooks: introduction to machine learning with scikit-learn, pytorch and deep learning
References
- Fabian Pedregosa et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research (2011)
- Scikit-learn official tutorials
- Andreas Mueller’s scikit-learn tutorial
- Adam Paszke et al., PyTorch: an imperative style, high-performance deep learning library, NeurIPS (2019)
- Pytorch tutorials
- Awesome python machine learning, curated materials
Day 5 – August 9th
Lecture topics
Python packages for molecules and materials, focus on basic data structures and functionalities.
- mendeleev (https://github.com/lmmentel/mendeleev)
- chemical name conversion using cirpy (https://github.com/mcs07/CIRpy) and pubchempy (https://github.com/mcs07/PubChemPy)
- RDkit (https://www.rdkit.org/) and molecular visualization
- pymatgen (https://pymatgen.org/)
- ASE (https://wiki.fysik.dtu.dk/ase/)
- molecular visualization
Lecture materials
- Teaching notebooks: RDKit and chemistry and materials packages
References
- RDKit tutorials
- RDKit cookbook
- Alexandre Varnek eds., Tutorials in Chemoinformatics, Wiley (2017)
- Daniel S. Wigh et al., A review of molecular representation in the age of machine learning, WIREs Computational Molecular Science (2022)
- Shyue Ping Ong et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, Computational Materials Science (2013)
- Pymatgen tutorial videos on its Youtube channel
- Ask Hjorth Larsen et al., The atomic simulation environment—a Python library for working with atoms, Journal of Physics: Condensed Matter (2017)
- ASE tutorials