Training on instructions

Similar to #150,

When training on completions/chat/assistant/instruction format where a prompt is given and model is trained only on the response, there are some errors.

Following the [HF tutorial](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only), if given a dataset in the tutorial's format using the following formatting function:

```
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts
```

An error of ```AttributeError: 'list' has no attribute 'startswith'``` is given. 

After some digging, I found out that the error stems from ```patch_sft_trainer_tokenizer```, in [```tokenizer_utils.py:826```](https://github.com/unslothai/unsloth/blob/8d9bd0ea8bf662618ba96fe7fe3478c5b81d0dff/unsloth/tokenizer_utils.py#L826):

```
L826 test_text = dataset[0][dataset_text_field] if (formatting_func is None or not use_formatting_func) else formatting_func(dataset[0])\n"
L829 test_text.startswith(tokenizer.bos_token)
```

If the ```output_texts``` is changed to ```output_texts[0]```, the ```AttributeError``` is resolved but another value error is given during training:

in the [TRL trainer:557](https://github.com/huggingface/trl/blob/b8b972fde183ec036885738e1439cd99877c2ad5/trl/trainer/sft_trainer.py#L557), 
```
if not isinstance(formatting_func(element), list):
                    raise ValueError
```

So what I gather, formatting_func(dataset[0]) has to both be a list and a string, which is obviously wrong.

My solution was to change```formatting_func(dataset[0])```  to ```formatting_func(dataset[0])[0]``` as ```formatting_func``` returns a list as per HF and transformers trainer. 

But even with this fixed there are some other issues. Will probably submit a PR once those issues are taken cared of as well.

***

Also, can I ask why is TRL not the latest version? Is there a reason for using ```SFTTrainer(args = TrainingArguments)``` instead of ```SFTConfig```?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training on instructions #603

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Training on instructions #603

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions