Show Notes
Claude Haiku 3.5 is put to the test in Parker’s hands-on look at coding and UI prompts, with a practical lens on cost, performance, and real-world workflows. Here are the key takeaways and hands-on findings.
Benchmark landscape at a glance
- Claude Haiku 3.5 performance
- On the AER (Agent for Coding) leaderboard: around 4th place with ~75% in coding tasks, trailing the top model.
- Pricing vs. capability: marketed as a cheaper option, but user feedback flags the price as high for a model that isn’t the absolute leader.
- Reasoning benchmarks
- Demonstrates solid chain-of-thought reasoning, but still not on par with the flagship, top-tier models.
- Compared to Gemini/other premium models, it’s competitive in some aspects but notably costlier.
- Practical takeaway
- If you’re optimizing for price-to-performance in coding tasks, Haiku 3.5 is a mixed bag: better than many but expensive for what you get relative to the best-in-class.
Hands-on testing: prompts and results
- Test setup (summary)
- Two primary prompts/tests: a Pixel-Perfect clone prompt (component copy) and a small web app UI task to compare Sunet vs Hau/HaCou outputs.
- Same prompts run against the two models to compare results and speed.
- Test 1: Pixel-Perfect clone (component copy)
- Sunet: produced more usable, structured results with clearer code blocks and narrative guidance.
- Hau/HaCou: struggled to start or render clean outputs; outputs were less consistent and harder to translate into workable code.
- Takeaway: Sunet tends to be stronger for direct coding tasks that require clean structure and stepwise delivery.
- Test 2: One-shot web app with an ingredient history UI
- Prompt involved generating a UI that shows ingredient history changes over time.
- Sunet: produced a more complete artifact (code + architecture notes) but acknowledged gaps needing refinement.
- Hau/HaCou: often failed to render a coherent TSX flow or skipped key pieces, making it less reliable for this kind of task.
- Takeaway: For frontend app generation, Sunet more consistently delivers usable scaffolds; HaCou lags behind in this test.
Prompt engineering and architect-style workflow
- Architect-focused workflow
- The idea from AER-style practice: start with planning, then define data models, APIs, and TS/React stack steps.
- Example high-level approach (condensed prompt pattern):
- Act as the software architect
- Define data models and API endpoints
- Output a structured plan in Markdown with code blocks and explanations
- Model behavior differences
- Sunet tends to produce clean markdown with distinct sections, code blocks, and step-by-step guidance.
- HaCou tends to mix formats and can require more manual post-processing to extract usable code.
- Practical note
- If the goal is documentation-to-code or vice versa, Sunet’s output tends to align better with developer workflows and tooling.
Practical workflows and takeaways
- Low-code evaluation workflow
- Make.com (Integromat) is a practical path to automate model testing without writing code.
- Build a workflow that sends repeated prompts, collects outputs, and stores results in a spreadsheet for quick comparison.
- Documentation-ready AI
- For turning a website or doc into well-formatted Markdown with snippets, you can feed a URL and have the AI generate structured docs, checklists, and examples.
- Quick-start prompts for coding
- Use an architect-style prompt to define scope, then drill down with TS/React/Tailwind specifics.
- Example, minimal prompt snippet:
- Act as the software architect. Define data models, API endpoints, and a TypeScript/React plan. Output in Markdown with code blocks for key parts.
- What to test with your team
- Run side-by-side comparisons on your own coding tasks (component scaffolds, API schemas, UI flows) to see which model handles your typical patterns best.
- Track both quality and speed to judge whether the cost aligns with your productivity gains.
Final takeaways
- Haiku 3.5 is not the top dog on coding benchmarks, but it’s a credible option with strong consistency in some tasks.
- Sunet generally outperforms HaCou for coding-focused prompts and architecture-style workflows, often at a better price-performance point.
- If you’re building workflows that rely on repeated AI tasks, consider low-code automation (Make.com) to test and compare models efficiently.
- Use architecture-first prompts to drive clearer outputs (data models, APIs, and structured steps) before diving into code generation.
Actionable next steps
- Do a two-model test for your key coding tasks and measure time-to-useful-output versus cost.
- Set up a Make.com workflow to automate iterative prompts and export results for quick side-by-side analysis.
- Experiment with an architecture prompt to see how each model formats its plan; use Sunet’s markdown/code structure to scaffold your project.
Links
- AI Coding Leaderboards (coding benchmarks and discussion)
- Claude 3.5 Haiku (Anthropic)
- Claude Models (Sonnet and other models)
- Make.com (low-code automation platform)
- Framer Motion (animation/UI tooling)