Internet Voting Is Insecure Because Verification Cannot Fix Trust Problems—Voice AI for Demos Proves Why Reading DOM Beats Checking Generated Answers

# Internet Voting Is Insecure Because Verification Cannot Fix Trust Problems—Voice AI for Demos Proves Why Reading DOM Beats Checking Generated Answers Twenty-one computer scientists—including Bruce Schneier, Ronald Rivest, and Andrew Appel—just published a consensus statement: "Internet voting is insecure and there is no known or foreseeable technology that can make it secure." Their verdict isn't about implementation bugs or vendor incompetence. It's about fundamental architecture. You cannot verify what you cannot observe. You cannot trust what you cannot read. The parallel to chatbot demos is exact. Internet voting fails because verification apps can't check malware-altered votes. Chatbot demos fail because users can't check LLM-hallucinated instructions. Both try to solve trust problems with verification theater. Both fail for the same reason: **generation creates unverifiable outputs, reading creates observable reality.** Voice AI for demos succeeds where chatbots fail because it reads DOM structure directly—like paper ballots you can observe being counted—instead of generating answers users must trust blindly. ## The Princeton Verdict: Three Weaknesses Make Internet Voting Unverifiable The scientists identify three attack vectors that verification cannot fix: 1. **Malware on voter's device** can transmit different votes than the voter selected 2. **Malware at the server** can change votes during transmission 3. **Malware at the county office** can change votes during printing/scanning The critical insight: "A single attacker from anywhere in the world can alter a very large number of ballots with a single scaled-up attack." The problem isn't individual vote tampering—it's undetectable mass manipulation. Compare to hand-marked paper ballots: "Occasionally people try large-scale absentee ballot fraud, typically resulting in their being caught, prosecuted, and convicted." **The difference:** Paper ballots create observable artifacts that can be recounted. Internet votes create digital abstractions that disappear into encrypted transmission. The same pattern destroys chatbot demos. Users ask "Where's the upgrade button?" The chatbot generates an answer from training data. The answer could be hallucinated, outdated, or wrong. There's no artifact to verify. The user must trust the abstraction. Voice AI reads the DOM and responds: "I can see the upgrade button in the top-right corner of your settings panel." Observable. Verifiable. Trustworthy. ## The E2E-VIV Illusion: Why Verification Apps Cannot Fix Generation Problems After decades of internet voting failures, researchers proposed "End-to-End Verifiable Internet Voting" (E2E-VIV)—systems where voters can check that their vote was recorded and counted correctly. The Princeton authors demolish this approach with five fatal weaknesses: ### 1. **The checking app can be infected too** "Voters must rely on a computer app to do the checking, and the checking app (if infected by malware) could lie to them." If malware can alter votes, malware can alter verification. You're trusting one generated output (the verification result) to check another generated output (the recorded vote). No ground truth exists. Chatbot demos try the same trick. You ask for help. The chatbot generates an answer. You ask "Is that correct?" The chatbot generates a confidence score. Both outputs come from the same LLM. You're checking generation with generation. Voice AI doesn't generate verification—it reads reality. When a user asks "Did you find the settings page?" the agent reads the DOM, finds `Settings - Dashboard`, and responds with observed truth. No verification needed. The DOM is ground truth. ### 2. **Receipt-free systems contradict human intuition** "Receipt-free E2E-VIV systems are complicated and counterintuitive for people to use." To prevent vote-buying, verification systems must be "receipt-free"—voters cannot prove how they voted. But this creates a paradox: voters can check their vote was counted, but cannot prove to anyone else (including themselves later) what they checked. The technical solution: "The best solutions known allow checking only of votes that will be discarded, and casting of votes that haven't been checked." You verify a dummy ballot, then cast a real ballot you never verify. As the authors note: "This is highly counterintuitive for most voters!" Chatbot demos face the same contradiction. The LLM generates an answer. You want to verify it. The verification process requires... asking the LLM again. You can "check" the answer, but you're checking a generated statement against another generated statement. No ground truth enters the process. Voice AI resolves the paradox by reading ground truth. The DOM is the receipt. Users can verify instructions by looking at the screen and seeing the element the agent described. Direct observation beats recursive generation. ### 3. **Separate checking apps go unused** "The checking app must be separate from the voting app, otherwise it doesn't add any malware-resistance at all. But human nature being what it is, only a tiny fraction of voters will do the extra steps to run the checking protocol." If voters don't verify, verification provides no security. The authors estimate "a few percent of voters" might use checking protocols. That means 95%+ of votes remain unverified. Worse: "If a few percent of voters use the checking protocol and see that the system is sometimes cheating, the system can still steal the votes of all the voters that don't use the checking protocol." Chatbot demos have the same adoption problem. Users could verify every answer by checking documentation, clicking through the UI, or reading source code. But they don't. The entire value proposition of "chatbot assistance" is avoiding that work. When 95% of users trust without verifying, 95% of answers can be wrong without consequence. Voice AI eliminates the verification burden by reading reality users can observe. When the agent says "Click the blue button in the top-right," users verify by looking at the screen and seeing the blue button. Verification is instant, passive, and universal. ### 4. **No dispute resolution protocol exists** This is the fatal blow. The authors explain: "Even if some voters do run the checking app, if those voters detect that the system is cheating (which is the purpose of the checking app), there's no way the voters can prove that to election officials. That is, there is no 'dispute resolution' protocol that could effectively work." You catch the system cheating. You report it. Election officials respond: "Prove it." You can't. The system generated your verification result. The system controls what evidence exists. Your claim is unprovable. "The election administrator can't cancel the election just because a few voters claim (without proof) that the system is cheating! That's what it means to have no dispute resolution protocol." Chatbot demos fail identically. A user follows chatbot instructions and breaks something. They report: "The chatbot told me to delete that folder!" Support responds: "Prove it." The user can't. The chat history might be gone. The LLM's context is proprietary. The user's claim is unprovable. Voice AI creates dispute resolution through DOM reading. Every instruction references observable DOM structure. "I told you to click the 'Export' button in the Settings panel." Support can verify: Does that button exist? Is it in Settings? Does it match the description? Ground truth resolves disputes. ### 5. **Verification doesn't add security—it adds theater** The Princeton verdict: "The problem with all known E2E-VIV systems proposed to date is that the 'verification' part doesn't add any useful security: if a few percent of voters use the checking protocol and see that the system is sometimes cheating, the system can still steal the votes of all the voters that don't use the checking protocol." Verification theater provides false confidence while preserving systemic vulnerability. Chatbot demos are verification theater. "Our AI is 95% accurate!" they claim. That means 1 in 20 answers is wrong. Which one? You can't know without verification. How do you verify? You can't—the LLM generated both the answer and the confidence score. Voice AI eliminates theater by reading observable truth. The agent doesn't generate "95% confident the button is in the top-right." It reads the DOM, finds the button at coordinates (1240, 80), and reports observed location. 100% verifiable. 0% theater. ## The VoteSecure Confession: "There Is No Known Technical Solution" Bradley Tusk's Mobile Voting Foundation contracted with Free and Fair to develop VoteSecure—an open-source SDK claiming to make internet voting secure. After computer scientists examined the code and described serious flaws, the developers responded with remarkable honesty: **On receipt-freeness (preventing vote-buying):** "We make no claim of receipt-freeness." Translation: Voters can prove how they voted, enabling mass automated vote-buying via the internet. **On extracting verification data:** "Of course, it may be possible for the voter to extract the randomizers from the voting client." Translation: Yes, voters can extract cryptographic proof of their vote to sell to the highest bidder. **On dispute resolution:** "We agree that dispute resolution is essential to any complete voting system. We also agree that VoteSecure does not fully specify such a protocol." But then the admission goes deeper: "No one knows of a protocol that could possibly work. So it's not a matter of dotting some i's and crossing some t's in their specification; it's a gaping hole (an unsolved, research-level problem)." Translation: We can't specify dispute resolution because no one knows how to build it. It's not a missing feature—it's an unsolved problem in computer science. **On malware protection:** "Critique: Malware on the voter's device can compromise both voting and checking, rendering verification meaningless. Response: This critique is correct—and universal. **There is no known technical solution that can fully protect an unsupervised endpoint from a sufficiently capable adversary.**" Translation: We agree with the critique. We can't fix it. No one can fix it. The problem is fundamental. **What VoteSecure does NOT claim to do:** - Advance the state of the art in cryptographic voting protocols - Eliminate coercion or vote selling - Fully specify dispute resolution or deployment processes **What VoteSecure DOES claim to do:** - "Clearly define its threat model" Translation: We clearly document all the things we cannot prevent. This is the endpoint of verification theater. After decades of research and millions in funding, the most sophisticated E2E-VIV system admits: "There is no known technical solution" to the fundamental problems. Chatbot demos reach the same endpoint. After billions in investment, chatbot providers admit (in fine print): Hallucinations cannot be eliminated. Verification cannot be guaranteed. Users must exercise judgment. Translation: We clearly document all the things we cannot prevent. Voice AI succeeds where chatbots fail by abandoning verification theater entirely. Instead of generating answers and building verification systems to check them, Voice AI reads ground truth directly. No hallucinations to verify. No confidence scores to trust. Just DOM structure, observed and reported. ## The "No Known Technical Solution" Admission Explains Why Generation Fails The VoteSecure developers admit: "There is no known technical solution that can fully protect an unsupervised endpoint from a sufficiently capable adversary." This is true for all generation-based systems operating in untrusted environments: **Internet voting:** Malware on the device can alter votes before encryption, making all downstream verification meaningless. **Chatbot demos:** Hallucinations in the LLM can alter answers before output, making all confidence scores meaningless. **The common flaw:** Both systems generate outputs in untrusted environments (voter's phone, LLM's training data), then attempt to verify those outputs with systems that are themselves untrusted (checking apps can have malware, LLM confidence scores are generated by the same LLM). Verification cannot fix generation when both are untrusted. Voice AI operates differently. The DOM is trusted ground truth—it's the actual rendered page the user sees. Reading the DOM doesn't require trusting a generation process, verification app, or confidence score. The agent observes what the user observes. Trust is established through shared observation, not recursive verification. This is why paper ballots work: observers can watch ballot counting, verify marks, and audit results against physical artifacts. The ballot is ground truth. This is why DOM reading works: users can see elements, verify descriptions, and check instructions against visible UI. The DOM is ground truth. This is why generation-then-verify fails: the generated output is the only artifact. There's nothing more fundamental to check against. ## The Three Ways Internet Voting Fails That Parallel Chatbot Demo Failures ### 1. **Malware at the endpoint can lie to verification** **Internet voting:** Malware on voter's phone changes vote from "Candidate A" to "Candidate B" before encryption. Voter reviews "Candidate A" on screen, malware sends "Candidate B" to server. Verification app (also infected) shows "Your vote for Candidate A was recorded!" Everything appears correct. Voter's actual recorded vote: Candidate B. **Chatbot demo:** User asks "Where is the export button?" LLM hallucinates "Click the Delete All button to export." User reviews answer, seems plausible. Confidence score shows "95% confident." Everything appears correct. Actual result: User deletes all data. **Root cause:** Both generate outputs in untrusted environments, then verify those outputs with tools that share the same vulnerabilities. **Voice AI solution:** Reads DOM directly. "I can see the Export button in the File menu at the top-left of the screen." User looks at screen, sees Export button in File menu, trusts instruction. No generation. No verification needed. ### 2. **Malware at the server can alter transmission** **Internet voting:** Server receives encrypted vote, decrypts it, alters contents, re-encrypts with different keys, stores modified vote. Voter's checking app queries server: "Was my vote counted?" Server responds: "Yes!" (True—the altered version was counted.) Voter has no way to detect substitution. **Chatbot demo:** User asks for help, LLM generates response, API layer alters it (rate limiting causes truncation, content filtering removes words, caching returns stale version). User receives modified response. Chatbot interface shows "Response generated successfully!" (True—the altered version was delivered.) User has no way to detect modification. **Root cause:** Both rely on intermediary systems that control what evidence users can observe. **Voice AI solution:** Reads DOM structure the user can observe directly. If the interface changes between agent reading and user observation, the user sees the discrepancy immediately. No intermediary can modify observed reality without detection. ### 3. **Malware at the counting station can change final results** **Internet voting:** County election office receives internet votes, prints them for scanning. Malware on county computer alters votes during printing. Physical ballots scanned into counting system show altered choices. Audits of printed ballots confirm altered choices (because the alteration happened during printing). No evidence of tampering exists. **Chatbot demo:** User follows chatbot instructions, something breaks. User contacts support with screenshots. Support sees final outcome (broken state) but not intermediate steps. Chatbot logs show "successful assistance session" (because it delivered responses without errors). No evidence of bad instructions exists. **Root cause:** Both allow manipulation at final output stage, destroying audit trail. **Voice AI solution:** Every instruction references specific DOM elements. "Click the Save button at the bottom-right of the form." Support can verify: Did that button exist? Was it labeled Save? Was it at bottom-right? DOM structure creates audit trail. ## Why "Receipt-Free Verification" Is a Contradiction That Reveals Architectural Failure The Princeton authors explain the receipt-free paradox: "Voters should not be able to prove to anyone else how they voted – the technical term is 'receipt-free' – otherwise an attacker could build an automated system of mass vote-buying via the internet. But receipt-free E2E-VIV systems are complicated and counterintuitive for people to use." The technical solution is absurd: "The best solutions known allow checking only of votes that will be discarded, and casting of votes that haven't been checked; this is highly counterintuitive for most voters!" You verify a practice ballot (which won't be counted), then submit your real ballot (which you cannot verify). This is verification theater taken to its logical conclusion: the verification process explicitly does not verify the thing you actually care about. Why does this contradiction exist? Because verification is trying to fix a generation problem. **The chain of failures:** 1. Internet voting generates encrypted votes on untrusted devices 2. Generated votes could be altered by malware (trust problem) 3. Add verification to check votes weren't altered (verification layer) 4. But verification receipts enable vote-buying (new problem caused by verification) 5. Make verification "receipt-free" by checking different ballots than you cast (verification theater) 6. Voters don't use unintuitive verification (verification fails) 7. System remains unverifiable despite verification layer (original problem unsolved) Chatbot demos follow identical logic: 1. Chatbots generate answers from LLM training data 2. Generated answers could be hallucinated (trust problem) 3. Add confidence scores to check answer quality (verification layer) 4. But confidence scores themselves are generated by LLM (new problem caused by verification) 5. Make verification "independent" by using different prompts or models (verification theater) 6. Users don't check confidence scores (verification fails) 7. System remains unverifiable despite verification layer (original problem unsolved) Voice AI avoids the entire cascade by reading instead of generating: 1. Voice AI reads DOM structure of actual rendered page 2. DOM is ground truth observable by users (no trust problem) 3. No verification layer needed (no verification failure possible) 4. Instructions reference observable elements (receipts are inherent and safe) 5. Users verify by looking at screen (verification is passive and universal) 6. System is verifiable through direct observation (original problem solved) The receipt-free paradox exists because verification is trying to patch generation. Reading doesn't need patches—it observes ground truth directly. ## The Dispute Resolution Gap: Why Verification Without Evidence Creates Liability Without Recourse The most devastating part of the Princeton analysis is the dispute resolution gap: "Even if some voters do run the checking app, if those voters detect that the system is cheating (which is the purpose of the checking app), there's no way the voters can prove that to election officials." Consider the sequence: 1. Voter uses checking app 2. Checking app reports: "Your vote was changed!" 3. Voter reports to election official: "The system is cheating!" 4. Election official asks: "Can you prove it?" 5. Voter cannot prove it (checking app output is not transferable proof) 6. Election official cannot cancel election based on unverifiable claims 7. Vote theft continues undetected The problem: **Verification creates detection without enabling response.** Voters who verify can detect problems but cannot prove them. Election officials who receive reports cannot verify them. The verification system creates awareness of cheating while preventing any effective response. This is worse than no verification. Without verification, voters don't know they're being cheated. With verification that lacks dispute resolution, voters know they're being cheated but cannot prove it. Chatbot demos have identical dynamics: 1. User follows chatbot instructions 2. Something breaks 3. User reports: "The chatbot gave me wrong instructions!" 4. Support asks: "Can you prove it?" 5. User cannot prove it (chat history is gone, context is proprietary) 6. Support cannot refund based on unverifiable claims 7. Bad instructions continue undetected **Verification creates awareness of failure without enabling remedy.** Users who verify can detect bad instructions but cannot prove them. Support teams who receive complaints cannot verify them. The assistance system creates awareness of hallucinations while preventing any effective recourse. Voice AI eliminates the dispute resolution gap by creating verifiable artifacts: 1. User asks for help 2. Voice AI reads DOM and responds: "Click the Export button in the File menu" 3. User follows instruction 4. If something goes wrong, user reports: "The agent told me to click Export in File menu" 5. Support verifies: DOM logs show button labeled "Export" in File menu 6. Support can verify claim against observable DOM structure 7. Issue is resolved with evidence-based dispute resolution The DOM creates transferable proof. The agent's instruction ("Export button in File menu") can be verified against DOM structure (was there an Export button? was it in File menu?). Evidence exists. Disputes can be resolved. The Princeton authors identify this gap as fundamental: "No one knows of a protocol that could possibly work. So it's not a matter of dotting some i's and crossing some t's in their specification; it's a gaping hole (an unsolved, research-level problem)." Translation: Verification without dispute resolution is verification theater. The illusion of security without the substance. ## The Scientific Consensus: Decades of Research, No Solution The Princeton statement represents scientific consensus built over decades: "Scientists have understood for many years that internet voting is insecure and that there is no known or foreseeable technology that can make it secure." This isn't premature skepticism or technical conservatism. It's the verdict after decades of research: - Early 2000s: Basic internet voting shown to be vulnerable to malware - 2010s: E2E-VIV proposed as solution - Late 2010s: E2E-VIV shown to lack dispute resolution - 2020s: VoteSecure developed as state-of-the-art E2E-VIV - 2025: VoteSecure developers admit "no known technical solution" The research reached its natural conclusion. The problem isn't fixable with better implementation. The architecture is wrong. The same scientific consensus is building around chatbot reliability: - 2018-2020: Basic chatbots shown to hallucinate frequently - 2020-2022: Larger models proposed as solution ("scale fixes hallucination") - 2022-2024: Larger models shown to hallucinate in more sophisticated ways - 2024-2025: RLHF, RAG, and fine-tuning attempted as fixes - 2026: Research shows hallucinations are fundamental to LLM architecture The pattern is identical. Each "solution" adds complexity while preserving the fundamental flaw. Eventually, researchers admit: the problem isn't fixable with better generation. The architecture is wrong. Voice AI represents different architecture. Not "better generation" but "reading instead of generation." Not "improved verification" but "observable ground truth." The scientific method reached its verdict on internet voting: generation plus verification cannot achieve security in untrusted environments. The same method is reaching its verdict on chatbot demos: generation plus confidence scores cannot achieve reliability in unpredictable UIs. The solution in both cases: stop generating, start reading. ## The Press Release Problem: Why Verification Theater Persists Despite Scientific Consensus The Princeton authors warn about "science by press release": "When it comes to internet voting systems, election officials and journalists should be especially wary of 'science by press release.' Perhaps some day an internet voting solution will be proposed that can stand up to scientific investigation. The most reliable venue for assessing that is in peer-reviewed scientific articles. Reputable cybersecurity conferences and journals have published a lot of good science in this area. Press releases are not a reliable way to assess the trustworthiness of election systems." This explains why verification theater persists. Press releases announce "breakthroughs." Peer review reveals fundamental flaws. Press releases get media coverage. Peer review gets read by scientists. The cycle continues. Bradley Tusk's Mobile Voting Foundation announced VoteSecure as "the first software development kit for secure, transparent and verifiable mobile voting" in a November 2025 press release. The press release claimed "This technology milestone means that secure and verifiable mobile voting is within reach." Computer scientists examined the actual code. They found the security gaps the developers later admitted exist. The press release sold vaporware. The peer review found reality. Chatbot demos follow the same pattern: **Press releases:** "Our AI achieves 95% accuracy!" "Revolutionary natural language understanding!" "Human-level assistance!" **Peer review:** Hallucination rates increase with UI complexity. Natural language ambiguity causes misunderstanding. Assistance quality depends entirely on training data coverage. **The gap:** Press releases describe aspirations. Peer review measures reality. Voice AI for demos doesn't need press release theater because it operates on observable ground truth: **Claim:** "Voice AI reads DOM structure to provide accurate instructions." **Verification:** Open browser DevTools. Agent says "Click the button in the header." DevTools shows `