Open Issues Need Help
View All on GitHubAI Summary: The task involves modifying the TICO library to convert the LlamaAttention module from the Hugging Face Llama model into a single, optimized 'attention' operation within the Circle model format. This requires integrating the RoPE (Rotary Position Embedding) and other LlamaAttention-specific operations into a custom Circle opcode, unlike previous work that relied on PyTorch's standard scaled_dot_product_attention function. The goal is to improve the efficiency of Llama model inference on ONERT.
A python library for converting Pytorch modules into a circle model that is a lightweight and efficient representation in ONE designed for optimized on-device neural network inference.