Custom Scores via API/SDKs
Langfuse gives you full flexibility to ingest custom scores
via the Langfuse SDKs or API. This allows you to run custom quality checks on the output of your workflows at runtime, or to run custom human evaluation workflows.
Exemplary use cases:
- Deterministic rules at runtime: e.g. check if output contains a certain keyword, adheres to a specified structure/format or if the output is longer than a certain length.
- Custom internal workflow tooling: build custom internal tooling that helps you manage human-in-the-loop workflows. Ingest scores back into Langfuse.
- Automated data pipeline: continuously monitor the quality by fetching traces from Langfuse, running custom evaluations, and ingesting scores back into Langfuse.
How to add scores
langfuse.score(
trace_id=message.trace_id,
observation_id=message.generation_id, # optional
name="quality",
value=1,
comment="Factually correct", # optional
id="unique_id" # optional, can be used as an indempotency key to update the score subsequently
)
→ More details in Python SDK docs
Data pipeline example
You can run custom evaluations on data in Langfuse by fetching traces from Langfuse (e.g. via the Python SDK) and then adding evaluation results as scores
back to the traces in Langfuse.