OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are most commonly used for:
- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
See https://beta.openai.com/docs/guides/embeddings
create : Input -> Ext.Http.TaskInput (Ext.Http.Error String) Output
https://beta.openai.com/docs/api-reference/embeddings/create
create
{ model = OpenAI.ModelID.TextEmbeddingAda002
, input = "The food was delicious and the waiter..."
, user = Nothing
}
|> OpenAI.withConfig cfg
|> Http.task
-- > Task.succeed
-- > { data =
-- > [ { embedding = [0.0023064255,-0.009327292,...,-0.0028842222 ]
-- > , index = 0
-- > , object = "embedding"
-- > }
-- > ]
-- > , model = Custom "text-embedding-ada-002-v2"
-- > , object = "list"
-- > , usage = { prompt_tokens = 8, total_tokens = 8 }
-- > }
{ model : OpenAI.ModelID.ModelID
, input : String
, user : Maybe String
}
{ object : String
, data : List Data
, model : OpenAI.ModelID.ModelID
, usage : Usage
}
{ object : String
, index : Basics.Int
, embedding : List Basics.Float
}
{ prompt_tokens : Basics.Int
, total_tokens : Basics.Int
}