Key principles to guide engineers building AI-first applications
When I started developing AI apps, it was still a new concept for many developers. Since then, I've developed many AI-first apps, including:
While so much is still changing, the underlying issues and engineering challenges have stabilized. In this post, I'll share my core learnings to guide you to build robust AI apps.
AI apps come in all shapes and sizes, so begin by clarifying how users interact with AI. The interaction model determines your UX, safety posture, and architecture.
Let's look at three common interaction models: chat, hybrid, and background.
The Chat interaction model requires stricter input filtering, abuse detection, PII protection, conversation memory, and rate limiting to limit abuse.
Due to its controlled inputs, the Background interaction model focuses more on data governance, reproducibility, and audit logs, and less on security protections.
Depending on your app, a Hybrid interaction model may be best, as it balances control with flexibility via structured prompts and strong output validation. These apps often primarily focus on task forms (which mimic the background model) with optional free-text fields (which require the protection of chat apps).
It's important to choose a model that fits your needs, budget, and more. Different models excel at different jobs. Here are six key considerations for selecting an AI model.
As models continue to develop, the choices will change. These six considerations, however, can guide you away from or towards a particular model.
The core experience of any AI app is powered by a trusted provider. While this space is still expanding, it's critical to identify your core needs as you evaluate the existing providers.
Let's start with a broad funnel and narrow it down to find a provider that fits your needs.
The options continue to expand, but there are many popular choices today, including: OpenAI, Anthropic, Google Cloud Vertex AI, AWS Bedrock, Azure OpenAI, Together AI, Groq, and Replicate.
1. Do the options meet your infrastructure needs?
Next, narrow down the list by identifying the features you need. Pay special close attention to:
2. Which features are core to your experience?
Your answer to this question is determined by your interaction model. Consider your needs for streaming, tool use/function calling, JSON mode, batch calls, vision/audio integrations, eval tooling, usage analytics, and spend controls.
Think optimistically and remove any providers from your list who don't meet your app requirements if your app succeeds.
3. Where do your users live?
The location of your users is a critical factor in selecting an AI provider. Consider the following:
4. What enterprise needs do you have?
As you move into production, you'll need your provider to have preexisting infrastructure and support for your enterprise needs. Key considerations include SLAs, uptime, data retention and residency, SOC2/ISO, PII handling, model governance, and support.
Throughout your application, implement reasoning strategies that match the task complexity, latency, and budget. Each application may require a different combination of these strategies.
As a general rule, favor structured prompting and hidden scratchpads over exposing raw thought processes for the best UX.
This strategy prompts the model to reason step-by-step internally to solve multi-step problems. It's particularly useful for complex reasoning, math logic, or multi-step planning where decomposition helps.
Implementation tips
The ReAct strategy involves a loop where the model alternates between thinking and using tools (search, code, DB queries). It's particularly useful for tasks needing external information, tool calls, browsing, or verification.
Implementation tips
This strategy explores multiple reasoning paths as a tree and selects the best branch. It's best for creative generation, hard reasoning puzzles, and evaluating queries across a broad range of alternatives.
Implementation tips
As with any production system, observability is critical for monitoring, debugging, and optimizing your AI app. Instrument everything from (sanitized) prompts, to your model and version, parameters, token counts, latency, tool calls, user/session IDs, and outcomes.
Implement spans for various stages such as prompt construction, retrieval, model calls, tools, and post-processing. This setup allows for effective bottleneck analysis and aids in regression debugging.
As you make changes to your application, maintain golden datasets and conduct offline evaluations. Additionally, monitor online metrics like click-through rate (CTR), task success, and user satisfaction to direct future app improvements.
Current observability tools
While similar to traditional software, AI applications must consider the unique challenges of large language models (LLMs).
Add guardrails to keep users safe, protect data, and preserve brand trust. Guardrails include:
Always add guardrails for customer-facing UIs and be stricter for agentic tools that can take actions (e.g., send emails, execute code, etc.).
Especially in such a new space that requires such heavy infrastructure interaction, design for spikes and provider hiccups.
Build retry logic into your application with exponential backoff, being careful to cap attempts. When posssible, prefer idempotent operations.
Consider per-step and end-to-end timeouts under load, favoring graceful degrades (i.e., shorter context, simpler model). When possible, use multi-region endpoints to improve availability and reduce latency.
Avoid single points of failure by defining a secondary model from a different provider with comparable capability and output format. To aid a multi-model system, normalize outputs with schemas.
Switch to your secondary model based on errors, latency thresholds, or dynamic quality signals.
Your data should be consistent across all models to ensure accurate and reliable results.
The ability of LLMs is often proportional to their context. When possible, retain memory in your application across interactions while protecting privacy.
Simple approaches
The easiest (and most expensive) approach is to store the full conversation history. However, this can lead to drift and privacy concerns, so consider a windowed approach (e.g., the last N messages, where N is tuned based on context budget and task complexity).
Advanced approaches
For more complex scenarios, consider the following strategies:
Memory raises important privacy concerns. At a minimum, let users view, edit, or reset memory to provide visibility and control. Apply conservative TTLs to prevent drift and ensure data freshness and encrypt memory at rest and in transit.
I trust this guide provides a solid foundation for building robust, modern AI applications. The world of AI is vast and ever-evolving. As we continue to push the boundaries of what's possible, prioritize understanding the underlying challenges, patterns, and methodologies to adapt as new tools emerge.
If you enjoy these types of engineering challenges, explore a career at Resend to join our team. Thanks for reading!