.. CLM documentation master file, created by sphinx-quickstart on Thu Apr 18 03:52:24 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to CLM's documentation! =============================== A package to train and evaluate language models of chemical structures, as used in the manuscript, :cite:t:`Qiang2024`. Note that training and evaluating chemical language models requires significant computational resources. Running the default `config.yaml` file distributed with the repository, for example, will submit a total of 2102 jobs, with a total requested runtime of >75,000 hours and a maximum memory of 256 GB. Actual runtimes will generally be substantially below this (but, for unusually large datasets, may be longer and require modification of the default resource requests), and resource requirements will vary substantially as a function of the configuration, with key parameters including the dataset size, number of cross-validation folds, degree of non-canonical SMILES enumeration, and amount of sampling to perform from the trained models (for instance, performing three-fold cross-validation and non-canonical SMILES enumeration at a fixed augmentation factor of 10x would reduce this resource request by approximately 20-fold). Nonetheless, access to a high-performance computing cluster with GPU resources is strongly recommended to train and evaluate new models. If you are simply looking to use the trained DeepMet model to annotate metabolomics data, consider using the DeepMet `web application `_. .. toctree:: :titlesonly: clm/installation.md clm/workflow.md clm/workflow_steps.md .. bibliography::