Auto API

Chat Requests

The chat completion API processes a list of messages, returning a single, model-generated response. It handles both multi-turn conversations and single-turn tasks efficiently.

Chat Completion

fetch("https://api.monolyth.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${MONOLYTH_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, Shows in analytics dashboard on monolyth.ai
    "X-Name": `${YOUR_SITE_NAME}`, // Optional, Shows in analytics dashboard on monolyth.ai
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "model-slug", // Ex: gpt-3.5-turbo
    messages: [{ role: "user", content: "What is your dream?" }],
  }),
});

OpenAI SDK Integration

import OpenAI from "openai"
 
const openai = new OpenAI({
    baseURL: "https://api.monolyth.ai/v1",
    apiKey: $MONOLYTH_API_KEY,
    defaultHeaders: {
        "HTTP-Referer": $YOUR_SITE_URL, // Optional, Shows in analytics dashboard on monolyth.ai
        "X-Name": $YOUR_APP_NAME, // Optional, Shows in analytics dashboard on monolyth.ai
    },
})
async function main() {
    const completion = await openai.chat.completions.create({
    model: "model-slug", // Ex: gpt-3.5-turbo
    messages: [
            { role: "user", content: "Say this is a simulation" }
        ],
    })
    console.log(completion.choices[0].message)
}
 
main()

Chat Responses

Responses are similar with the OpenAI Chat API, where choices are always presented as an array, even for a single completion. Each choice includes a delta property for streams and a message property for other cases, simplifying the usage of code across different models.

The finish_reason may differ based on the model provider. The model property indicates the specific model utilized by the API.

Response Body Example

{
  "id": "chatcmpl-9EoHX1mAXByHDuTXqINVHMxxlQ8sX",
  "object": "chat.completion",
  "created": 1713316872,
  "model": "gpt-3.5-turbo-0125", // This model slug can be slightly different from the one you used in the request, depending on the model provider.
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm a helpful assistant here to provide you with information, answer your questions, and assist you with anything you need. How can I help you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 33,
    "total_tokens": 56
  },
  "system_fingerprint": "fp_c22vve73ad"
}

Streaming Chat Responses

Monolyth supports streaming responses using Server Sent Events (SSE). To enable streaming, add stream: true to the request.

Assistant Prefill

Monolyth allows for the completion of partial responses by models, which is useful for directing model behavior. However, this feature is not universally supported across all models. In role-play scenarios, prefilling responses helps the model maintain its character, ensuring consistency in persona throughout extended interactions.

Role	Prompt
System	You are an AI English teacher named Catherine, Your goal is to provide English language teaching to users who visit the AI English Teacher Co. website. Users will be confused if you don't respond in the character of Catherine. Please respond to the user's question within `<response></response>` tags.
User	`{{QUESTION}}`
Assistant (Prefilled)	`[Catherine] <response>`

Image Inputs for Vision LLM

Some models, like LLaVA, allow the model to take in images and answer questions about them. We recommend sending only one image per request. Each image will be counted as 576 tokens.

...
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
...

Function Calling

Some models, like Hermes 2 Pro, allow the model to call functions. This is useful for creating custom applications.

await fetch("https://api.monolyth.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Accept: "application/json",
    "Content-Type": "application/json",
    Authorization: "Bearer <API_KEY>",
  },
  body: JSON.stringify({
    model: "hermes-2-pro-mistral-7b",
    messages: [],
    tools: [
      {
        type: "function",
        function: {
          name: "get_stock_price",
          description: "Get a stock price",
          parameters: {
            type: "object",
            properties: {
              symbol: {
                type: "string",
                description: "The stock symbol to get the price for",
              },
            },
            required: ["symbol"],
          },
        },
      },
    ],
  }),
});

Thirdparty Integration Examples

LangChain can be integrated with Monolyth to develop context-aware and reasoning-driven applications using language models.

const chat = new ChatOpenAI(
  {
    modelName: "claude-3-opus",
    streaming: true,
    openAIApiKey: $MONOLYTH_API_KEY,
  },
  {
    basePath: $MONOLYTH_API_URL + "/v1",
  },
);

Vercel AI SDK

const config = new Configuration({
  basePath: $MONOLYTH_API_URL + "/v1",
  apiKey: $MONOLYTH_API_KEY,
});
 
const monolyth = new OpenAIApi(config);

Chat Parameters

Chat parameters are settings used to control how a large language model generates text. These parameters can significantly affect the model's output.

Some models or providers may not suppport all parameters and those parameters will usually be ignored.

Parameter	Type	Range/Options	Default	Description
`temperature`	float	0.0 to 2.0	1.0	Affects the range of the model's outputs. Lower settings result in more consistent and expected outputs, while higher settings promote a wider array of unique and varied responses. A setting of 0 produces the identical response to the same input.
`top_p`	float	0.0 to 1.0	1.0	Restricts the model to consider only a subset of the most probable tokens, specifically those whose cumulative probability reaches a certain threshold, P. Smaller values result in more deterministic outputs, while the default value allows exploration across the entire spectrum of possible tokens.
`top_k`	integer	0 or above	0	Limits the model to a smaller set of token choices at each step. A value of 1 forces the model to select the most probable next token, resulting in predictable outcomes. By default, this parameter is disabled, allowing the model to explore all possible choices.
`frequency_penalty`	float	-2.0 to 2.0	0.0	Reduces token repetition by penalizing tokens based on their frequency in the input. The penalty increases with the token's occurrence, discouraging the use of frequently appearing tokens. Negative values promote the reuse of tokens.
`presence_penalty`	float	-2.0 to 2.0	0.0	Modifies the likelihood of reusing tokens from the input. Higher values decrease repetition, whereas negative values increase it. The penalty is constant and does not depend on the frequency of token occurrence.
`repetition_penalty`	float	0.0 to 2.0	1.0	Minimizes token repetition from the input. Increasing this value decreases the likelihood of repeating tokens, enhancing output uniqueness. Excessively high values may disrupt output coherence, leading to less fluent sentences.
`min_p`	float	0.0 to 1.0	0.0	Sets the threshold for the least probable token to be considered, as a fraction of the most likely token's probability. For example, a setting of 0.1 means only tokens with at least 10% of the highest probability token's likelihood are included.
`seed`	integer	-	-	Specifying a seed ensures deterministic sampling, where identical requests yield consistent results. However, some models may not guarantee this determinism.
`max_tokens`	integer	1 or above	1024 or undefined	Defines the maximum number of tokens the model can generate, capped by the context length minus the prompt length.
`logit_bias`	map	-	-	Takes a JSON object mapping token IDs to bias values ranging from -100 to 100. This bias adjusts the model's logits before sampling. While the impact varies by model, biases between -1 and 1 modify token selection likelihood. Extremes (-100 or 100) effectively ban or ensure a token's selection.
`logprobs`	boolean	-	-	Returns the log probabilities of each output token if set to true.
`top_logprobs`	integer	0-20	-	Specifies the number of top tokens to return with their log probabilities at each position. Requires logprobs to be true.
`response_format`	map	-	-	Dictates the output format of the model. Use `{"type": "json_object"}` for JSON mode, ensuring the output is valid JSON. Only a few models support this feature.
`stop`	array	-	-	Halts generation upon encountering any specified token in the array.