llama-2-13b-chat-awq Beta
Text Generation • theblokeLlama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant.
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
Worker - Streaming
Worker
Python
curl
Parameters
Input
-
0
object-
prompt
string min 1 max 131072The input text prompt for the model to generate a response.
-
image
-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0 max 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min 0 max 2Increases the likelihood of the model introducing new topics.
-
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
-
-
1
object-
messages
arrayAn array of message objects representing the conversation history.
-
items
object-
role
stringThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool').
-
content
string max 131072The content of the message as a string.
-
-
-
image
-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
functions
array-
items
object-
name
string -
code
string
-
-
-
tools
arrayA list of tools available for the assistant to use.
-
items
-
0
object-
name
stringThe name of the tool. More descriptive the better.
-
description
stringA brief description of what the tool does.
-
parameters
objectSchema defining the parameters accepted by the tool.
-
type
stringThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
objectDefinitions of each parameter.
-
additionalProperties
object-
type
stringThe data type of the parameter.
-
description
stringA description of the expected parameter.
-
-
-
-
-
1
object-
type
stringSpecifies the type of tool (e.g., 'function').
-
function
objectDetails of the function tool.
-
name
stringThe name of the function.
-
description
stringA brief description of what the function does.
-
parameters
objectSchema defining the parameters accepted by the function.
-
type
stringThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
objectDefinitions of each parameter.
-
additionalProperties
object-
type
stringThe data type of the parameter.
-
description
stringA description of the expected parameter.
-
-
-
-
-
-
-
-
stream
booleanIf true, the response will be streamed back incrementally.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0 max 2Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min 0 max 2Increases the likelihood of the model introducing new topics.
-
Output
-
0
object-
response
stringThe generated text response from the model
-
tool_calls
arrayAn array of tool calls requests made during the response generation
-
items
object-
arguments
objectThe arguments passed to be passed to the tool call request
-
name
stringThe name of the tool to be called
-
-
-
-
1
string
API Schemas
The following schemas are based on JSON Schema