Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The document extractor can't extract the text content from the Excel file. #15878

Open
5 tasks done
ywf opened this issue Mar 14, 2025 · 3 comments
Open
5 tasks done

The document extractor can't extract the text content from the Excel file. #15878

ywf opened this issue Mar 14, 2025 · 3 comments
Labels
🐞 bug Something isn't working

Comments

@ywf
Copy link

ywf commented Mar 14, 2025

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.0.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I'm creating a workflow in Dify. I hope to use the document extractor to read the text content in an Excel file, but the "text" read out is empty.

pload "文档提取器读取excel失败.jpg" -->

✔️ Expected Behavior

The document extractor can be used to read the text content of an Excel file.

❌ Actual Behavior

No response

@ywf
Copy link
Author

ywf commented Mar 14, 2025

Image

@ywf
Copy link
Author

ywf commented Mar 14, 2025

Text can be read from Word and TXT files. However, for Excel files, it doesn't work even when I try other files.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Mar 14, 2025
Copy link

dosubot bot commented Mar 14, 2025

I found a similar unsolved discussion titled File Extractor Import Excel Cannot Read is displayed as emptyHelp which might be related to your issue [1]. Additionally, there was a closed issue titled The LLM does not get the contents of the document extractor correctly, which was resolved by a pull request that fixed Unicode escape characters [2].

For your specific problem, ensure that the necessary dependencies for the ExcelExtractor are installed and updated. These include pandas, openpyxl, and xlrd [3]. If these are already in place, check if there are any configuration settings or conditions in your workflow that might be preventing the ExcelExtractor from executing, such as an empty variable_selector [4]. Additionally, review any logs for error messages that might provide insights into the failure [5].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant