Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: introduce ai operator namespace and deprecated semantics #1511

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Mar 19, 2025

Semantic operator contents are copied over except for aggregate and cluster_by

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 19, 2025
@sycai sycai marked this pull request as ready for review March 24, 2025 19:57
@sycai sycai requested review from a team as code owners March 24, 2025 19:57
@sycai sycai requested a review from ZehaoXU March 24, 2025 19:57
@sycai sycai requested review from chelsea-lin and removed request for ZehaoXU March 24, 2025 19:57
@@ -41,6 +42,20 @@ def semantic_operators(self, value: bool):
warnings.warn(msg, category=bfe.PreviewWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the experiment_options for semantic_operators be marked as deprecated as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -84,6 +84,9 @@ class ComputeOptions:
semantic_ops_confirmation_threshold: Optional[int] = 0
semantic_ops_threshold_autofail = False

ai_ops_confirmation_threshold: Optional[int] = 0
ai_ops_threshold_autofail = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the docstring as the semantic ops here. Also, mark semantic_ops as deprecated in the docstring too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.semantic_operators = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the options should be updated into ai_operators too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bigframes.pandas.DataFrame: DataFrame filtered by the instruction.
Raises:
NotImplementedError: when the semantic operator experiment is off.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ai operator experiment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

score_column: Optional[str] = None,
):
"""
Performs semantic search on the DataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it AI semantic search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

THRESHOLD_OPTION,
1,
"compute.semantic_ops_threshold_autofail",
True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ai_ops_threshold_autofail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done

@@ -4574,4 +4575,12 @@ def _throw_if_null_index(self, opname: str):

@property
def semantics(self):
warnings.warn(
"The 'semantic' property will be removed. Please use 'ai' instead.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

Suggested change
"The 'semantic' property will be removed. Please use 'ai' instead.",
"The 'semantics' property will be removed. Please use 'ai' instead.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return bigframes.operations.semantics.Semantics(self)

@property
def ai(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a docstring? Also, please add the AI class to the accessors section in the docs:

https://github.com/googleapis/python-bigquery-dataframes/blob/main/docs/reference/bigframes.pandas/frame.rst#accessors

as well as thhe table of contents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



@log_adapter.class_logger
class Ai:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python style says this should be AI not Ai.

https://peps.python.org/pep-0008/#descriptive-naming-styles

When using acronyms in CapWords, capitalize all the letters of the acronym. Thus HTTPServerError is better than HttpServerError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sycai
Copy link
Contributor Author

sycai commented Mar 24, 2025

Don't merge this until #1531 is merged, and copy over the change.

@sycai sycai requested review from tswast and chelsea-lin March 24, 2025 22:01
Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with leaving few nit comments.

Guards against unexepcted processing of large amount of rows by semantic operators.
semantic_ops_confirmation_threshold (int, optional):
Semantics operators are deprecated. Please use AI operators instead.
[Deprecated] Guards against unexepcted processing of large amount of rows by semantic operators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo for unexepcted.
Also use deprecated doc format, similar to
https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes._config.bigquery_options.BigQueryOptions#bigframes__config_bigquery_options_BigQueryOptions_use_regional_endpoints

.. deprecated:: 1.42.0
   Guards against unexpected processing of large amount of rows by semantic operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4574,4 +4575,13 @@ def _throw_if_null_index(self, opname: str):

@property
def semantics(self):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the warning message as below:

msg = bfe.format_message("The 'semantics' property will be removed. Please use 'ai' instead.")
warnings.warn(msg,  category=bfe.PreviewWarning)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast for suggestions too.
Should we use @typing_extensions.deprecated as other LLM methods does?

@typing_extensions.deprecated
(
    "PaLM2TextGenerator is going to be deprecated. Use GeminiTextGenerator instead. ",
    category=exceptions.ApiDeprecationWarning,
)



@log_adapter.class_logger
class AI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it as AIAccessor to keep consistency with other accessors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants