Artificial Intelligence FAQ

What it is. How it works. What it can do.

A.I. illustrated as Sierpinski triangular fractals
This text, mainly written in the form of a technical terms glossary, is meant as quick-start guide on AI ("AI" is short for "Artificial Intelligence"), a large and fascinating area of computer science. Short paragraphs per term summarize often complex matter to provide you with an overview of the field, a first primer on this fascinating topic. Get a grip on AI concepts, AI lingo and technical terms, common abbreviations, LLMs, language processing and more. Micropolis believes in the advancement of scientific and engineering excellence through support of education, of research and by sharing knowledge. So, let's try that and dive right in.

Dear Reader: Please note that the Article is Work-in-Progress.

AI Disciplines

What is intelligence? Is it memory, knowing facts and figures? Is it learning, insight and problem-solving? Is it reflection and reasoning? How important is the combination of all of these, to be capable of emotional understanding, of sympathy, of compassion? Is true creativity the sole domain of a living and breathing being? While academia has a fixed set of terms to describe what intelligence is and what it should be able to do, actual common sense in describing intelligence is more diverse and part of an ongoing debate. Similarly, artificial intelligence is a well-described phenomenon and has its technical terms. But beyond academic analysis, there is again a real-world, common sense approach to looking at and evaluating artificial intelligence today. The AI community has developed metrics to rank the performance and overall quality of AI and its capabilities. Below is a set of disciplines that have evolved in the area of language models. Such categories try to collect challenges in reasoning, creativity - like in writing or coding, problem-solving or communication skills. These categories can help us to determine where progress is made and to measure where technology is at right now:
  • Expert
    A model's knowledge-domain-specific assistance and reasoning quality.
  • Occupational
    Work- or Business-related task handling and problem-solving.
  • Math
    Calculation and mathematical proficiency of a model.
  • Instruction Following
    Understanding of and accuracy in obeying user input.
  • Multi-Turn
    Is a model able to maintain coherence, even over long conversations?
  • Creative Writing
    Measure the quality of imaginative, emotional and stylistic text generation.
  • Coding
    How adept is a model at programming and debugging?
  • Hard Prompts
    Humans enter prompts in a colloquial way with "noise", ambiguous remarks or contradictions.
  • Longer Query
    Approx. 10% of all prompts exceed 500 tokens. How well does a model perform on such inputs.
  • Language
    Ability to handle multiple languages or help with language-related tasks.

AI Search

is a combination of text generation through an LLM with traditional information retrieval. A few years ago, search was mostly based on SQL databases and exact matching of keywords. This approach has many shortcomings, as a word like "Documents" won't match an entry that has the keyword "Document". The problem that information technology researchers are chasing for decades now is how to implement a fast and smart fuzzy matching function. An earlier solution is word stemming, where plural forms, or suffixes are broken away from words and basic (truncated) forms of keywords are added to the database index. Stemming the word "documents" yields "document" and stemming a user's search input likewise will produce a match, for "document" and "documents". Later, more advanced algorithms were introduced to solve this problem and find database entries that are a close fit without requiring the user to input the exact search term. Some calculate "edit distance" (Levenshtein distance), others are "PGres trigram" (PostgreSQL Trigrams) or "BM25". A different approach to solve this problem is vector search. In vector search, input is converted into numerical vectors, so that input tokens define vectors in a high-dimensional vector space. A search query as a whole, expressed as a vector, can be thought of as a zig zag line projected into virtual space. A similar query input will render as a similar vector. Doing the same for the search corpus and then comparing arbitrary search input with stored vectors yields closely related search results. And while such vector representations are robust against misspellings and can even find entries thematically linked to queries, vector search is always vague and struggles with exact matches, proper nouns, cryptic product numbers, specific named entities and similar. This is why modern search implementations usually follow a hybrid approach. Each query triggers two separate searches. One query is run against a traditional database setup, with fulltext inverted index, stemming, exact matching and ranking logic. From this search only the top results are kept and put into a ranked list (BM25, "Okapi best matching algorithm no. 25"). The second query is issued against a vector database and returns a ranked list of entries that are closely related to the query input vectors. These two ranked lists are then merged via an algorithm like "Reciprocal Rank Fusion" (RRF). This way a final ranked search results list is generated. Internally, it is now decided how many of the relevant entries are used for continued processing. These results, in fulltext or excerpts, are then programmatically inserted into a prompt template to produce a "Super-Prompt", along the lines of: "You are a helpful assistant. Here are three documents about "Documents". With these as context, answer the user's question: . This prompt is then locally or remotely fed into a Large Language Model to infer the final answer. This overall scheme, using an LLM to encode knowledge as vectors and merging traditional with vector search in a hybrid approach and presenting search results in conversational summarizing text instead of in a list of entries is what is commonly known as "AI Search" or "AI enhanced" search. The term for it is "Retrieval-Augmented Generation" (RAG). Compare Vector Database.

Cloud LLM

is a Large-Language-Model (LLM) hosted by a cloud/ AI provider as part of a cloud platform where it can be accessed via API (Application Programming Interface). During early roll-out of enterprise-level artificial intelligence processing services, vendors chose a cloud-hosted business model to solve multiple challenges. Preparing and running an LLM is a complicated matter and here a managed operation allowed vendors to lower technical and financial entry barriers on this challenging technology for customers. Further, Cloud LLM providers regard their model training and large-scale inference operation as key intellectual property. Keeping these systems closed source and the structure of backend systems behind tight security allows businesses to protect their business secrets. Third, hosting a model remotely and offering access only via API allows vendors to monitor and adapt systems much easier and in tighter update-release cycles than it would be possible with a deployed system. This way, an LLM provider can learn from common client input patterns, adapt training and model behavior, optimize operations by aligning with real-world workloads and develop effective production-hardened model input/output content filtering on planet-scale input corpora. That said, most if not all Cloud LLM providers shifted away from training their models on client input received under enterprise contracts, but they can still benefit from lessons learned in content marshalling. Read the article "Legalities of User Content: The Shift in Ownership" for more on that. Regarding guardrails and content filtering, compare Prompt Injection.

GGUF

short for "GPT-Generated Unified Format", is an AI model file format. It was developed by Georgi Gerganov, user @ggerganov on Hugging Face, for his influential C++ llama.cpp LLM inference engine and introduced as the file format successor to GGML. GGUF stores both tensors and metadata in one binary file and tries to remedy difficult metadata handling found in its predecessor, GGML. The file format as well as the llama.cpp LLM runner are endorsed by the Hugging Face portal and GGUF is popular within the AI community due to its single-file simplicity.

Hugging Face

sometimes "HuggingFace" or "HF" for short, misspelled "Huggin Face" or "HugginFace", is a Web community portal. It has become the central hub for distribution and sharing of AI models, datasets, essentials tools and software libraries (e.g. Transformers) of the open-source AI ecosystem. It can be described as the "github of AI". Hugging Face is a strong supporter of the GGUF model format. Hugging Face has its name and logo from the popular "hugging face" emoji which depicts a smiling face that either "grabs his own face" or has "its arms open in an inviting gesture".

Legalities of User Content: The Shift in Ownership

When AI appeared on the wider Internet and a large audience started to interact with Chatbots and image generation, questions of ownership, of copyright and liability arose quickly. After all, where did the AI's knowledge come from? From curated datasets, human annotation, supervised learning and closed licensed sources but also in a large part from protected and/or copyrighted public sources. How could this be? How can private operations use protected intellectual property to train their models? The answer lies in a recent change in Copyright law that likens AI model training to how humans learn. In Copyright legislation, a growing base of Countries and Supranational Organizations opted for a treatment where Copyright is "conditionally relaxed" (some would say suspended) for AI model training applications. Clever counsels worked this into legislation shortly before the AI boom, namely in the form of the European Union's Text and Data Mining (TDM) exception in the 2019 DSM Directive (Articles 3 & 4). In the US, model training rests on existing fair use doctrine (17 U.S.C. §107) anyway. When people absorb knowledge, common knowledge, folklore, or content encountered through articles and videos, nobody litigates for copyright breach - unless content is literally copied, of course. Now, with AI, machines were able to consume large quantities of knowledge and distill summarizing or paraphrasing near-verbatim content from it with ease. Does this fact change views on it? During the early stage of this development, some entities approached the generated content with traditional corporate rigor and claimed full ownership of any generated output and in turn only gave users a license to use it as well. This had been the de facto norm of handling such issues: a strategy of maximization. On social media, it is usually similar, as user content, once uploaded, is fully and irrevocably licensed to the platform operator. Corporations were used to claiming ample rights. With AI, in contrast, it became clear very soon that this stance could not be upheld. The paraphrases of Chatbots, the images generated by generative AI, were just too often too similar to what was already out there. All too often, copyrighted works trickled into generated content. Legal issues popped up in quick succession, from offensive or illegal content to infringement of personality rights, to misappropriation of likeness and Right of Publicity violations. Generative AI at its core is probabilistic. Combined with temperature and sampling settings, AI is often too unpredictable to really control. Resulting outputs make automated content moderation and human review an ongoing challenge. It was clear: legal terms for AI usage had to change. Today, many if not all AI providers define AI as a mere tool and their act of offering access to it as service. Everything else a user does with a provided AI falls under the responsibility of the user - and this in broad strokes. It was a paradigmshift. It could be poetic justice. Operators now defer ownership to the user, for everything that is provided as "User Input" and everything that is generated for the user based on this User Input, the User Output. What started as an asset is now regarded as a liability that is better given away. Cloud AI providers in turn only ask users to license User Content back so that it may be used to improve the service, through model training or analysis - but this licensing is very often coupled with generous opt-out options. As of 2026, this is also the legal line Micropolis assumes for all Micropolis AI services. Users own their content. Of course, this short article is only an overview and can't replace close reading of each AI provider's terms and can't replace professional legal advice.

Logit

Large Language Models are multi-layered constructs. When an LLM processes text, it converts human-readable text into "Tokens" and then these Tokens into Embeddings. Based on the structure of the model, while input is transferred through these layers, the model's network does its actual "associative work" in various stages of filtering and weighing. At the end of this pipeline, when a model produces its output (via a weight matrix), it emits a specific 'raw score value' for each next output token, the "Logit". Logits are usually floating-point values but may be quantised to integers. By applying a normalizing exponential function on Logit values (usually the softmax function), a probability over K possible outcomes is calculated, the "probability for the next Token". These probabilities can then be used to calculate Perplexity for a given token stream or for the model's output as a whole. Logits do not carry a semantic meaning, while "Embeddings" do. Compare "Token" and "Perplexity".

OpenClaw

is an open-source self-hostable private AI Agent framework. In early 2026 it took many technically interested circles by storm as it offered a simple installation wizard and combined a slew of then-available but fractured tools, plugins, API endpoints and mechanisms under one unified interface. Through a combination of broad permissions and immediate execution of its own LLM generated instructions, OpenClaw is able to produce surprising and impressive results in the matter of minutes. But as an LLM can't think, all of this is unreflected and unfiltered by common sense. OpenClaw is a digital accelerant and an important contribution to the AI saftey and ethics debate. The project was kicked-off by Austrian software-developer Peter Steinberger who decidedly employed the mode of Vibe Coding to develop the OpenClaw source code. He considers it to be a bold and highly experimental undertaking and as an early proof-of-concept. Initially called "Clawdbot" and "Moltbot", the software was later renamed to "OpenClaw" due to naming conflicts. The code is released under the MIT License as this license explicitly excludes any liability for damage done. In its current state, with large portions of unreviewed source-code and incomplete testing and at the same the time the installer wizard urging users to give the AI agent unlimited control over a system, network resources, passwords, accounts, private data etc., computer experts consider OpenClaw as highly insecure and potentially dangerous software.

Perplexity

is a technical term from information theory and a measure used in AI research. The magnitude of perplexity describes the degree of uncertainty for a discrete probability distribution. Discrete means there is a defined number of possible outcomes, for example 6 in a dice roll or 2 in a coin flip. Likewise, the perplexity of a coin flip is 2. The technical term was first used in the article "Perplexity - a measure of the difficulty of speech recognition tasks" in The Journal of the Acoustical Society of America in 1977, written by Jelinek, Mercer, Bahl and Baker. Perplexity measures not "chance" but uncertainty or more precisely the "effective branching factor" in a situation, like deciding which paths to follow on a fork in a path. The term rose to prominence in AI research as well, where it can be used to describe, simply put, the confidence of an LLM (or "its own surprise") when predicting the next token, the next word. Or rephrased: Perplexity is calculated from a logarithmic assessment of likelihood, averaged over multiple steps. Lower values, as in probability, describe a more confident prediction here. By aligning perplexity with benchmark values and measuring performance of different models or model generations, it is possible to evaluate overall model quality or training progress. Perplexity in mathematic notation is often "PPL(X)" (read "perplexity of x). Compare "Token" and "Logit". Apart from the technical term, which is not so well known outside AI, the word "perplexity" is often associated with the web app of Perplexity AI, Inc. from San Francisco. The venture backed startup rose to prominence during the AI boom 2022/2023 and became one of the main players in this wave of companies. The company offers an LLM-supported query interface that distills colloquial answers to user input from a combination of a large-scale traditional web search in combination with language model reasoning.

Prompt

in the olden days of computing, a prompt was the blinking cursor on a black and white telnet terminal. The blinking urged the user to enter something, to start typing. It signalled: the computer is ready to take your commands. A prompt, traditionally, was answered by some form of rule based syntax the computer could understand. Entered commands centered around verbs or keywords, alternated with numeric or textual values. The computer, in turn, was unforgiving with typos or the user deviating from the norm. It was asking for a specific pattern that had to be learned by the user. Such text based prompt interfaces were opaque, syntax-heavy and the learning curve was steep. Yet, to this day, the CLI (Command Line Interface) is still the norm for low-level system control. Over the years, there were occasionally experiments to make computers more human, to align the interface with what users, humans, normally do in communication. But natural language parsing was slowly advancing and every-day language continued to escape rigid parsing algorithms. When graphic user interfaces appeared, the point and click nature of telling a computer what to do covered up that behind these much more accessible interfaces the command nature was still the same. But as an image is worth a thousand words, the desktop metaphor and windowed user interfaces hold up to this day and decades of research put into finding intuitive usage patterns or nudging users by assistive technologies improved the chores of using a computer a lot. With the advent of AI technologies now, the introduction of LLMs, telling a computer what to do became what engineers had dreamed over for many years. Even technical newbies now can tell a computer what to do, what their intent is, and the computer will answer such "input prompts" either with the wanted output or assist the user in getting what she or he was going after.

Prompt Caching

is a feature of many Cloud LLMs that is meant to help with latency (time to first token, TTFT) and lower processing costs for customers. Many prompts issued against cloud LLM APIs usually contain repetitive content. This is especially true for chat completion style prompts. With these interactions, a series of subsequent assistant and user messages is prepended by one System prompt to provide the most important instructions to the LLM for role and tone of the following conversation first. Also, it is common to provide essential facts or conversation structure to the model as a Few-Shot Learning baseline. These System Prompts thus can run quite large and typically contain static or rarely updated data, and consequently it makes sense to cache these first input tokens on the LLM vendor side. How exactly this is implemented is usually not disclosed and varies slightly from vendor to vendor. One common requirement though is that prompt tokens meant to be cached must be static and exactly the same between requests. So in order to enable early tokens to be cacheable, clients need to provide truly static content to the remote LLM first and only after this append variables like day-to-day instructions, time or locale based facts, user-specific instructions or any other variable content. Some vendors require clients to insert specific break-marks after the to-be-cached leading token content. As prompt caching improves efficiency on both ends, for vendors and customers, it is often an auto-enabled feature and can effectively lower costs quite dramatically. Compare System Prompt and Few-Shot Learning.

Self-supervised

in LLM model training, "Self-supervised" means that a model during its training process is "only supervising itself", or technically is only extracting labels as they are stochastically inherent in the training corpus. As such, self-supervised is the opposite of a "supervised" approach, where humans precondition, label or tag content in a training corpus to prepare the data before model training or guide the model in applying a specific label to a certain training data trait. Self-supervised training can be useful in extracting raw traits as they appear in a dataset or to produce a foundational reference model for comparative or downstream applications. That said, it might be noted that extracted traits are not unbiased or actual, but merely align with what is found in the underlying data - which in turn might be biased or counterfactual.

SSE

short for "Server-Side-Events". SSE is a Web technology and a W3 standard that describes a scheme and protocol for streaming text data from a server to a client, usually a browser. In contrast to WebSockets, SSE is unidirectional (only one direction). It is a lightweight solution to push real-time updates from a server to a client. As of 2026, SSE is very well supported in browsers. Its MIME-type is "text/event-stream". SSE saw a proliferation on the Internet with the advent of Internet AI chatbots and AI search engines. Stemming from the iterative nature of AI generated content, which usually renders in "successive bursts" or token chunks, there was a need to push these incremental updates to the client. Instead of the server waiting for the AI subsystem to fully complete its output, AI providers usually align with the segmented generation and stream updates to the client as they arrive. From a UX (user experience) perspective, this is much better than letting a user wait for seconds until content is done. Instead, users can see the generation as it happens, with a first incomplete output after fractions of a second coming in. One interesting note is that cloud LLM providers like OpenAI usually use the MIME-type "text/event-stream" but intentionally break the SSE scheme's specifications by providing an Event stream as response to a POST request. By specification, SSE usually only stream back from GET requests, but providers use the POST type of request as client requests may be very large. So effectively, providers chose to use some aspects of the SSE scheme, like record separation and its MIME type, but actually only do a simple text stream to clients. And clients, in turn, usually implement around this oddity and use XHR or fetch() operations in lower-level reads instead of a real standard-conforming SSE event consumer.

System Prompt

A "system prompt" (or "role prompt") is a high-priority first prompt given by a developer or interface supervisor. Instead of starting with a normal message prompt, doing "role prompting" and supplying the model with a first guiding input generally significantly improves output quality. Pre-texting a chat session with a role prompt like "You are a seasoned medical professional, overseeing a large department of a city hospital" is more than just silly roleplay. A system prompt helps the AI to focus and work within a more guided corridor of possibilities. This channeling elevates accuracy and overall model performance. Aside from that, a well crafted system prompt helps by defining the output tone, verbosity, wording and aligns output better with the target context of generated content ("anchoring"). System prompts are more important in chat-style model interactions where conversations depart from a specific first message. In RAG scenarios, user input is typically embedded into a Prompt Template to have the model reason over user input in relation to template content and retrieved context or (local/remote) external data. Prompt Templates, by contrast, may be provided to a model as system prompt or as regular user messages. The name "System Prompt" comes from the de-facto OpenAI API JSON protocol standard, where clients can define a "role" key/value as part of the request. Note that cloud LLM providers and model vendors sometimes rename roles, like "system" to "developer" or they introduce finer-grained role-level hierarchies. As models don't throw an error when an unknown role token is used, developers need to check which role token a model has been trained with and use the model or API accordingly. Also compare Prompt Template and Prompt Caching.

What's in a good system prompt

In deployment of chat assistants, for user support, on websites or in telephone dialog systems, a system prompt is usually one of the most important elements to set the tone, domain and rules for what an assistant is expected to answer, do and know. Here, a system prompt acts like a "configuration file" in establishing how an LLM assistant works. That's why crafting a solid system prompt is one of the first steps towards success with an LLM based chat system. That said, it is important to note that the system prompt counts against a model's context window, so a verbose text here will reduce the total "memory" left for the actual conversation. Especially with older models with smaller context windows, this may be relevant. So a well-done system prompts is a challenging task in between conflicting priorities of briefness and all-encompassing instructions and guardrails. The idea is maximum effectiveness at minimum token use. Compare Context Window and Prompt Engineering.
  • Be explicit not implicit. Models are not humans. Drop the fluff and give clear instructions. Models tend to follow a clear rule better than a vague or implicit suggestion. No use in being polite.
  • Earlier tokens receive higher priority. Although message roles like "system" are a paramount concept, models also weigh information inside messages according to position, so put your most important instructions at the beginning. But read the docs: some models apply higher weighting towards the end (recency bias).
  • Structure and order your prompt. Mixing style and rules, for example, adds unnecessary noise. A clear structure like: Order, Rules, Constraints and Output is a solid foundation.
  • Use structuring markup to guide the model. From training, models already know markup like Markdown-style headings, lists or XML-style sections and nesting markers. Using these can help the model to digest the structure of your prompt. Newlines are important to convey segmentation.
  • Punctuation and capitalization. Some models ignore capitalization, others can be guided by some words in all-caps. Using an exclamation mark is often just as strong as using a simple full stop. Models read text different than humans. Generally, it is better to avoid exclamation marks as they may lead to unpredictable weighting.
  • Use lists and clear instructions. Verbose text might be more readable, but long prose is blurry for a model in comparison with an enumeration of short sentences or bullet points.
  • No use in repetition. Repeating instructions is usually not needed as models don't forget things or skim over sections like humans may do. That said, repetition is effective for emphasis, especially in high verbosity prompts and for core constraints.
  • The model already knows things. Try to avoid reiterating things the model already knows from training. When adding knowledge, the idea is to amend the model's data where it runs thin. And try to find a balance between knowledge autonomy and grounding with added facts.
  • Use AI. While authoring text is a creative domain for humans, aligning an assembled instruction sheet with an LLM's requirements is not. There is much use in using AI itself to check and optimize a system prompt.
  • Read vendor docs. Not every AI model is the same. And vendors know their models best. Look for prompting guides and best practices in official documentation to help you fine-tune your system prompt for a specific model family or model.

Temperature

Although AI systems are intrinsically deterministic (non-probabilistic), some deliberately introduced parameters modify this behaviour. This tuning of a model is what leads to a common perception of AI models being probabilistic or "random" in their output. The Temperature parameter is one key factor here. The Temperature parameter internally controls a model's leeway in finding its stochastic pathways through its knowledge encoded in its weights. A higher temperature leads to a higher degree of randomness. In combination with sampling settings, the possible path through the model can be tuned from very unpredictable to nearly fully predictable. This is conceptually similar to image rendering, where light "portals" are used to bracket forks in the light path and this way narrowing or widening the possible trajectories after a node portal. The actual process of Temperature-scaling occurs between arriving at Logits and before these values are passed into the Softmax function. By controlling the "value corridor" between these steps, the resulting diversity or "creativity" of a model's output can be effectively controlled. Compare "Logit".

Token

Tokens, in general, emerged from the field of text analysis and "Natural Language Processing" (NLP) where tokens, the "granular units" resulting from a tokenization process, are used as segmenting "windows" on a stream of characters, of text. Tokens may be Unigrams (an arbitrary self-contained unit of text), N-grams (an ordered sequence of characters or symbols of arbitrary length) or Bigrams (a two element n-gram, for example, a unit of two words, two symbols or two tokens). In the context of large language models (LLMs), the word "Token" usually designates arbitrarily sized and often quite long chunks of characters, whole words or text fragments. In Byte-Pair encoding (BPE), tokenization begins with only two characters, a Digram, but in iterations of statistical compression, forms a representational Token. Conceptually, this is similar to, for example UTF-8 encoding where two or three bytes are used to encode a specific literal - but with LLM tokenization this is on a higher, more abstract level. Tokens are unique to a specific family or even generation of LLMs and represent a numeric representation for a char/char-sequence. Libraries like "tiktoken" for OpenAI's models can be used to tokenize arbitrary text. This separation of tokenization into a process that can be done on client side, in the same or very similar manner as internal systems at OpenAI do, makes tokenization and token count an important metric when cloud LLMs are used. This is because Cloud LLM Platforms usually use "number of tokens used" to measure "credit use". The idea here is that there is a certain cost associated with processing input tokens or generating output tokens and in metering the "token burn" by a user, an AI vendor can bill its customers for services rendered. As of 2026 and with popular models, a rule of thumb is that one token generally corresponds to about four characters for common English text and roughly equals 100 tokens per 75 words. In a simplification, sometimes a token is described as a "probabilistic token", conveying that a model is working to predict the "next probable token", i.e. answering the question "what is the most probable word or char sequence after this series of previous tokens". This mixes terminology as Tokens are not characters, not words and not Embeddings but numerically arbitrarily chunked stream units unique to a model or model family. Compare "Logit" and "Perplexity".

Tokenization

is the process of partitioning or segmenting text into defined pieces of information. Traditionally, a token means a "lexical token" and is any single- or multi-character identifier (unit) that carries a certain defined meaning. In natural language it can be the actual word meaning, in codified syntaxes it can be any arbitrary meaning or value. Tokenization describes the deconstruction of an input string into its contained tokens for a subsequent pass of applying meaning to a sequence of tokens (e.g. part of speech tagging) or treatment of tokens for storage and retrieval (e.g., in inverted indexes). More recently, with the proliferation of AI, the label "Tokenization" is increasingly associated with machine learning, where tokens are likewise extracted from an input string. The way AI systems utilize tokens is analogous to how inverted index search engines use them. Search engines also process tokens, align them with a defined vocabulary, assign a numeric equivalent per token (an integer index from a translation table) and handle these numeric representations exclusively in subsequent processes. This comes from the fact that in storage and retrieval, processing numbers is more efficient than processing characters, as computers process only numbers internally anyway. In AI, a model only works with numbers (Tensors) after Tokenization. Compare "Token".

Tool Calling

also "tool use", is a model's ability to execute commands or call external tools, like a human user would. This means, when a user query is processed, the model can be granted permission to use a tool by supplying a specific array of available tools. The model then uses the descriptions of each available tool to decide if using one of these tools could fill-in or answer questions with knowledge that it so far does not possess, either from training data or through conversation context. Implementation requires a more elaborate setup within the framework calling a model and support within the inference engine running the model as tool use usually follows a three-step pattern. In a first model call the request has the "tool" parameter defined to trigger structured output in case the model decides to use a tool. This may be left to the model to decide or be explicitly requested. The model then decides which tool to use based on the tools' descriptions. For example, if a user asks about a specific device, the model decides if it knows anything about it. If not, it tries to find out if there is a tool on the list that may fill in this missing data. In this case, the resulting model output will be a structured response that states which tool the model wants to call and with which parameters (the device name, model id, etc.). The wrapping framework then executes the actual request against the tool. In a third step finally, the model is again prompted with the user's input message to generate the free form text answer, but this time amended with the added context the tool use returned. On this third step, the framework would omit the "tool" parameter in order to avoid an endless loop in case the tool call did not yield useful context. Tool Calling is related to Retrieval Augmented Generation (RAG), although in RAG retrieval is usually done before the model is queried. That's why Tool Calling is sometimes called "Agentic RAG". The concept of tool calling may also be seen as a rudimentary form of a model's ability to use a computer.

TTFT

short for "Time To First Token". TTFT is a metric to describe the performance of an AI hardware and software stack, its inference latency. Available RAM, CPU or GPU type and its throughput in combination with efficiency of the LLM runner all determine how long it takes after submitting a query into the inference engine and until the first token is emitted. It is not uncommon for an LLM stack to take 100-500ms until a first token is output. This and the nature of sequential token output has led to streamed responses or iterative placeholders to improve User Experience (UX) on AI web interfaces. Compare SSE.

Vibe Coding

is a work approach in computer programming where a developer uses mostly AI coding tools to produce software source code. It usually focuses on fast iteration and quick results instead of structured systematic design. Vibe Coding may be done in a heavily assisted mode of operation, where the developer mainly engineers prompts, then reviews the output and prompts again to form the final code. Such an approach is becoming increasingly popular among experienced developers who use AI tools for a first draft, a quick prototype or to bootstrap a project (boilerplate code) and then work traditionally from there. In an extreme way, Vibe Coding may describe a mode where a completely inexperienced, non-technical person instructs an AI to produce source code that is not understood and is only assessed by the code's ability to fulfill a certain task, yet completely ignoring the fact that the software may contain unintended features, flawed logic or senseless code. After all, such code has never been reviewed by a knowledgeable computer-savvy human.

Note on trademarks

Many of the designations used by manufacturers and sellers to distinguish their products or services are claimed as trademarks. Where those designations appear in this text and Micropolis and/or the authors were aware of a trademark claim, the designations are mentioned along with their owners and may be additionally marked with a trademark symbol. Their use here in this FAQ on artificial intelligence (AI) is for educational use of the reader and is covered under nominative fair use. Micropolis is in no way suggesting support, sponsorship or endorsement of the owner of these trademarks. Only as much of such marks is used as is necessary to identify the trademark owner, product, or service.