[Data/LLM] Error handling for invalid input rows

### Description

When using `build_llm_processor` to perform some batch inference, there's currently no logic to filter out invalid rows, or rows that lead to errors. This leads to the whole pipeline failing (potentially due to just a few rows). 

Proposal:
- `exception_handler_hook` for the llm stage so that a user can customize error handling

Alternatively, support something similar to Gemini and OpenAI, which writes invalid/errored records to a separate file

### Use case

When using build_llm_processor to perform some batch inference, there's currently no logic to filter out invalid rows, or rows that lead to errors. This leads to the whole pipeline failing (potentially due to just a few rows), see example error below:

`2025-10-01 13:47:57,074	ERROR streaming_executor_state.py:553 -- An exception was raised from a task of operator "MapBatches(TokenizeUDF)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.`

It’s hard to pre-check everything in the preprocess, so it would be nice to handle this and capture this information somewhere instead of failing the pipeline (especially on a very large dataset)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data/LLM] Error handling for invalid input rows #57140

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data/LLM] Error handling for invalid input rows #57140

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions