Auto

by monolyth
N/A context

When your model slug is unknown, your prompts will be processed by llama-3-70b-instruct.

Last update: 5/6/2024

Chat Requests

The chat completion API processes a list of messages, returning a single, model-generated response. It handles both multi-turn conversations and single-turn tasks efficiently.

Chat Completion

fetch("https://api.monolyth.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${MONOLYTH_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, Shows in analytics dashboard on monolyth.ai
    "X-Name": `${YOUR_SITE_NAME}`, // Optional, Shows in analytics dashboard on monolyth.ai
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "model-slug", // Ex: gpt-3.5-turbo
    messages: [{ role: "user", content: "What is your dream?" }],
  }),
});

OpenAI SDK Integration

import OpenAI from "openai"
 
const openai = new OpenAI({
    baseURL: "https://api.monolyth.ai/v1",
    apiKey: $MONOLYTH_API_KEY,
    defaultHeaders: {
        "HTTP-Referer": $YOUR_SITE_URL, // Optional, Shows in analytics dashboard on monolyth.ai
        "X-Name": $YOUR_APP_NAME, // Optional, Shows in analytics dashboard on monolyth.ai
    },
})
async function main() {
    const completion = await openai.chat.completions.create({
    model: "model-slug", // Ex: gpt-3.5-turbo
    messages: [
            { role: "user", content: "Say this is a simulation" }
        ],
    })
    console.log(completion.choices[0].message)
}
 
main()
 

Chat Responses

Responses are similar with the OpenAI Chat API, where are always presented as an array, even for a single completion. Each choice includes a property for streams and a property for other cases, simplifying the usage of code across different models.

The may differ based on the model provider. The model property indicates the specific model utilized by the API.

Response Body Example

{
  "id": "chatcmpl-9EoHX1mAXByHDuTXqINVHMxxlQ8sX",
  "object": "chat.completion",
  "created": 1713316872,
  "model": "gpt-3.5-turbo-0125", // This model slug can be slightly different from the one you used in the request, depending on the model provider.
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm a helpful assistant here to provide you with information, answer your questions, and assist you with anything you need. How can I help you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 33,
    "total_tokens": 56
  },
  "system_fingerprint": "fp_c22vve73ad"
}

Streaming Chat Responses

Monolyth supports streaming responses using Server Sent Events (SSE). To enable streaming, add to the request.

Assistant Prefill

Monolyth allows for the completion of partial responses by models, which is useful for directing model behavior. However, this feature is not universally supported across all models. In role-play scenarios, prefilling responses helps the model maintain its character, ensuring consistency in persona throughout extended interactions.

RolePrompt
SystemYou are an AI English teacher named Catherine, Your goal is to provide English language teaching to users who visit the AI English Teacher Co. website. Users will be confused if you don't respond in the character of Catherine. Please respond to the user's question within tags.
User
Assistant (Prefilled)

Image Inputs for Vision LLM

Some models, like LLaVA, allow the model to take in images and answer questions about them. We recommend sending only one image per request. Each image will be counted as 576 tokens.

...
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
...

Function Calling

Some models, like Hermes 2 Pro, allow the model to call functions. This is useful for creating custom applications.

await fetch("https://api.monolyth.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Accept: "application/json",
    "Content-Type": "application/json",
    Authorization: "Bearer <API_KEY>",
  },
  body: JSON.stringify({
    model: "hermes-2-pro-mistral-7b",
    messages: [],
    tools: [
      {
        type: "function",
        function: {
          name: "get_stock_price",
          description: "Get a stock price",
          parameters: {
            type: "object",
            properties: {
              symbol: {
                type: "string",
                description: "The stock symbol to get the price for",
              },
            },
            required: ["symbol"],
          },
        },
      },
    ],
  }),
});

Thirdparty Integration Examples

LangChain can be integrated with Monolyth to develop context-aware and reasoning-driven applications using language models.

const chat = new ChatOpenAI(
  {
    modelName: "claude-3-opus",
    streaming: true,
    openAIApiKey: $MONOLYTH_API_KEY,
  },
  {
    basePath: $MONOLYTH_API_URL + "/v1",
  },
);

Vercel AI SDK

const config = new Configuration({
  basePath: $MONOLYTH_API_URL + "/v1",
  apiKey: $MONOLYTH_API_KEY,
});
 
const monolyth = new OpenAIApi(config);

Chat Parameters

Chat parameters are settings used to control how a large language model generates text. These parameters can significantly affect the model's output.

Some models or providers may not suppport all parameters and those parameters will usually be ignored.

ParameterTypeRange/OptionsDefaultDescription
float0.0 to 2.01.0Affects the range of the model's outputs. Lower settings result in more consistent and expected outputs, while higher settings promote a wider array of unique and varied responses. A setting of 0 produces the identical response to the same input.
float0.0 to 1.01.0Restricts the model to consider only a subset of the most probable tokens, specifically those whose cumulative probability reaches a certain threshold, P. Smaller values result in more deterministic outputs, while the default value allows exploration across the entire spectrum of possible tokens.
integer0 or above0Limits the model to a smaller set of token choices at each step. A value of 1 forces the model to select the most probable next token, resulting in predictable outcomes. By default, this parameter is disabled, allowing the model to explore all possible choices.
float-2.0 to 2.00.0Reduces token repetition by penalizing tokens based on their frequency in the input. The penalty increases with the token's occurrence, discouraging the use of frequently appearing tokens. Negative values promote the reuse of tokens.
float-2.0 to 2.00.0Modifies the likelihood of reusing tokens from the input. Higher values decrease repetition, whereas negative values increase it. The penalty is constant and does not depend on the frequency of token occurrence.
float0.0 to 2.01.0Minimizes token repetition from the input. Increasing this value decreases the likelihood of repeating tokens, enhancing output uniqueness. Excessively high values may disrupt output coherence, leading to less fluent sentences.
float0.0 to 1.00.0Sets the threshold for the least probable token to be considered, as a fraction of the most likely token's probability. For example, a setting of 0.1 means only tokens with at least 10% of the highest probability token's likelihood are included.
integer--Specifying a seed ensures deterministic sampling, where identical requests yield consistent results. However, some models may not guarantee this determinism.
integer1 or above1024 or undefinedDefines the maximum number of tokens the model can generate, capped by the context length minus the prompt length.
map--Takes a JSON object mapping token IDs to bias values ranging from -100 to 100. This bias adjusts the model's logits before sampling. While the impact varies by model, biases between -1 and 1 modify token selection likelihood. Extremes (-100 or 100) effectively ban or ensure a token's selection.
boolean--Returns the log probabilities of each output token if set to true.
integer0-20-Specifies the number of top tokens to return with their log probabilities at each position. Requires logprobs to be true.
map--Dictates the output format of the model. Use for JSON mode, ensuring the output is valid JSON. Only a few models support this feature.
array--Halts generation upon encountering any specified token in the array.