Open Issues Need Help
View All on GitHubAI Summary: The task is to create a GitHub Actions workflow that automates the process of building and publishing the `code-explainer` Python package to PyPI (the Python Package Index). This involves using a build backend compliant with PEP 517, the `twine` tool for uploading, and a PyPI API token for authentication. An optional test deployment to `testpypi` is also requested.
AI Summary: Update the project's CONTRIBUTING.md file to include a quickstart checklist (around 5 minutes), links to the project's Discussions section, and reference the examples/README.md file and the model presets table from the main README.
AI Summary: The task is to add a Continuous Integration (CI) job that tests the provided Colab notebook example to ensure it remains functional. This involves choosing a suitable testing tool (nbval or papermill), integrating it into the CI pipeline, and potentially handling conditional execution to skip the test on forks.
AI Summary: This task requires expanding the existing code explanation datasets by adding 20-50 more small code examples to the JSON files located in the `data` directory. Additionally, it involves creating documentation detailing the JSON schema and fields used in these datasets, which should be added to the `examples/README.md` file (and potentially a `data/README.md` file). The changes should ensure that the evaluation tests run quickly.
AI Summary: Update the project's README file to include a table summarizing available model presets (DistilGPT-2, CodeT5 Small/Base, CodeGPT, StarCoderBase-1B), along with their corresponding training and evaluation commands. The information should be concise and easily understandable, referencing the existing examples directory for more detailed instructions.
AI Summary: Create a Google Colab notebook demonstrating the installation, training, evaluation, and optional serving of the code explainer project using the provided configuration and data. The notebook should be added to the `examples` directory and linked from the project's README.
AI Summary: Update the project's README to include documentation for new CodeT5 and StarCoder model presets, adding a table summarizing the available presets (architecture, base model, config file, training and evaluation commands), and instructions on how to switch between them using the `configs/default.yaml` file.
AI Summary: The task requires writing unit tests for a code explainer project, specifically focusing on the tokenization and masking processes for both seq2seq and causal language models. The tests should cover source/target tokenization, handling of padding tokens, prompt masking, and padding masking. The tests should use small configuration files to minimize download times and be compatible with the project's CI system.
AI Summary: Create small example datasets (train, eval, test) with about 10 code-explanation pairs each in JSON format and update the examples/README.md to include instructions and copy-pastable commands for training, evaluating, and serving the code explainer using different model presets (DistilGPT-2, CodeT5-small, etc.).