> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vessl.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Send a request

# Overview

Send a request and fetch the result directly.

In contrast to asynchronous APIs, this API will return the result in the same connection,
and there is no JSON wrapping in either input or outputs. Thus, you can use this API
as if you are directly accessing your service.

<Warning>
  When the service is in a cold state (i.e. there are no running replicas due to service
  idleness) and a new request is made, a new replica will be started immediately.

  In such case, the first few requests **may get aborted** due to timeouts,
  until the replica becomes up and running. Please consult your HTTP client's timeout configuration.
</Warning>

## Interaction code example

<CodeGroup>
  ```shell curl theme={null}
  $ curl \
      -H "Authorization: Bearer ${TOKEN}" \
      "${BASE_URL}/request/predictions/my-model" \
      --json '{"question": "1+1 = ?"}'

  {"answer": "The answer is 3. No, it's 11."}
  ```

  ```python Python theme={null}
  import requests
  base_url = "https://..."
  token = "..."
  path="/predictions/my-model"

  r = requests.post(
      f"{base_url}/request/{path[1:]}",
      headers={"Authorization": f"Bearer {token}"},
      json={
          "question": "1+1 = ?"
      }
  )
  print(r.text)
  ```
</CodeGroup>

# Request

<Info>
  <h3>Authorization</h3>

  You must provide a token in `Authorization` header with `Bearer` scheme, as:

  ```
  Authorization: Bearer <token>
  ```

  The token can be found in the web UI (in service overview's Request dialog).
</Info>

## Path parameters

<ParamField path="base_url" type="string" required="true">
  Base URL for your service. This value can be found in the web UI (in service overview's
  Request dialog).

  Typical value: `https://serve-api.dev2.vssl.ai/api/v1/services/<slug>`
</ParamField>

<ParamField path="path" type="string" required="true">
  Path to use to make a request to your service.

  Your service should provide corresponding endpoint. Common path values used for inference include:

  * `/v2/models/my-model/infer`
  * `/predictions/my-model`
  * `/v1/completions`
</ParamField>

# Response

Response from your service will be relayed. Thus, there is no fixed form of response.

The response will be streamed with low latency, so it can be used in live streamed applications,
e.g. chatting or text completions using large language models (LLMs).

<Note>
  HTTP response headers from your service will be generally stripped out.
  **Only the following headers** will be passed along:

  * `Content-Type`
  * `Content-Length`
</Note>

<RequestExample>
  ```text Request example theme={null}
  POST /predictions/my-model
  (...)
  Content-Type: application/json
  Content-Length: 23

  {"question": "1+1 = ?"}
  ```
</RequestExample>

<ResponseExample>
  ```text Response example theme={null}
  200 OK
  (...)
  Content-Type: application/json
  Content-Length: 43

  {"answer": "The answer is 3. No, it's 11."}
  ```
</ResponseExample>
