Hopes for a more accurate and functional Siri voice assistant currently lean heavily on the short-term fix: Apple’s recently announced partnership with Google to use the latter’s Gemini tech to improve its own AI offerings. But in the longer term, a new research paper offers a method that could allow Apple to make Siri faster all by itself.
The paper, Principled Coarse-Grained Acceptance for Speculative Decoding in Speech, was written by five researchers working for Apple and Tel-Aviv University and published late last month (via 9to5Mac). It proposes a new approach that could, in researchers’ words, “accelerate speech token generation while maintaining speech quality.”
The key to speed, the researchers argue, is avoiding unnecessary strictness. “For speech LLMs that generate acoustic tokens,” they write, “exact token matching is overly restrictive: many discrete tokens are acoustically or semantically interchangeable, reducing acceptance rates and limiting speedups.” In other words, at a certain level of similarity, it doesn’t matter which of two possible speech tokens is selected, since they sound or mean essentially the same thing, and it’s wasting time and processing resources to insist on working out which one is right.
The solution proposed is to group acoustically similarly tokens together.
“We propose Principled Coarse-Graining (PCG), a framework that replaces exact token matching with group-level verification,” the paper explains. “We construct Acoustic Similarity Groups (ASGs) in the target model’s token embedding space, capturing its internal organization of semantic and acoustic similarity. PCG performs speculative sampling on the coarse-grained distribution over ASGs and carries out rejection sampling at the group level.”
The researchers claim this will increase speed without significantly lowering reliability. In experiments (see page 4 of the paper), increasing the number of tokens per second slightly lowers accuracy, but far less than with standard speculative decoding.
The paper is rather technical, but it’s not very long. Check out the pdf to read the whole thing.

