Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_gbq: add mechanism to ensure BQ storage API usage #886

Open
calbach opened this issue Feb 28, 2025 · 2 comments
Open

read_gbq: add mechanism to ensure BQ storage API usage #886

calbach opened this issue Feb 28, 2025 · 2 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@calbach
Copy link

calbach commented Feb 28, 2025

Assumption: there is currently no supported way to force read_gbq to use the BQ storage API. I'd be happy to be corrected if I missed something!

Is your feature request related to a problem? Please describe.

I have cases where read_gbq's heuristic chooses the JSON API, when I want the storage API. This is most noticeable for me on medium sized tables, which might take 5-20 seconds to load via the JSON API (comparatively these were much faster via the storage API). For many of my use cases: to make interactive use cases more bearable, I am very willing to pay the additional storage API cost.

Describe the solution you'd like

A parameter to read_gbq which forces the usage of the BQ storage API (including raising an error if the necessary deps are not available to do so). I won't try to be prescriptive about the details, though I'll note that the desired behavior I've described is what I expected from use_bqstorage_api, based on the name. From my understanding of the current behavior, allow_bqstorage_api is maybe more accurate.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Feb 28, 2025
@GarrettWu GarrettWu assigned TrevorBergeron and unassigned GarrettWu Mar 3, 2025
@tswast
Copy link
Collaborator

tswast commented Mar 19, 2025

I think you are correct. Basically, we are now calling query_and_wait, which might not create a destination table that we can read from with the BQ Storage API. Such a flag would have to force the use of query from google-cloud-bigquery.

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Mar 19, 2025
@tswast
Copy link
Collaborator

tswast commented Mar 19, 2025

It sounds like we need three values for use_bqstorage_api:

  • True: always use (use query)
  • False: never use (currently works, I believe) (use query_and_wait and disable bq storage client creation)
  • "default" (or None): choose based on the heuristics in query_and_wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants