Skip to content

Commit 9b70ced

Browse files
authored
feat: add solutions to lc problem: No.3089 (doocs#2504)
No.3089.Find Bursty Behavior
1 parent ddc13d6 commit 9b70ced

File tree

4 files changed

+170
-2
lines changed

4 files changed

+170
-2
lines changed

solution/3000-3099/3089.Find Bursty Behavior/README.md

+55-1
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,66 @@ Each row of this table contains post_id, user_id, and post_date.
7777

7878
## 解法
7979

80-
### 方法一
80+
### 方法一:自连接 + 分组统计
81+
82+
我们可以使用自连接,将表 `Posts` 与自身连接,连接条件是 `p1.user_id = p2.user_id``p2.post_date``p1.post_date``p1.post_date``6` 天之间,然后我们将连接结果按照 `p1.user_id``p1.post_date` 分组,即可统计出每个用户在每天的 7 天内的发帖数量,我们将这个结果保存在表 `P` 中。
83+
84+
接着我们统计出每个用户在 2024 年 2 月份的每周平均发帖数量,保存在表 `T` 中。注意,我们需要查找 `post_date``2024-02-01``2024-02-28` 之间的记录,将记录按照 `user_id` 分组,然后统计每个用户的发帖数量,最后除以 `4` 即可得到每周平均发帖数量,我们将这个结果保存在表 `T` 中。
85+
86+
最后,我们将表 `P` 和表 `T` 连接,连接条件是 `P.user_id = T.user_id`,然后按照 `user_id` 分组,统计出每个用户在 7 天内的最大发帖数量,最后筛选出满足条件 `max_7day_posts >= avg_weekly_posts * 2` 的记录,即可得到结果。注意,我们需要按照 `user_id` 升序排序。
8187

8288
<!-- tabs:start -->
8389

8490
```sql
91+
# Write your MySQL query statement below
92+
WITH
93+
P AS (
94+
SELECT p1.user_id AS user_id, COUNT(1) AS cnt
95+
FROM
96+
Posts AS p1
97+
JOIN Posts AS p2
98+
ON p1.user_id = p2.user_id
99+
AND p2.post_date BETWEEN p1.post_date AND DATE_ADD(p1.post_date, INTERVAL 6 DAY)
100+
GROUP BY p1.user_id, p1.post_date
101+
),
102+
T AS (
103+
SELECT user_id, COUNT(1) / 4 AS avg_weekly_posts
104+
FROM Posts
105+
WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
106+
GROUP BY 1
107+
)
108+
SELECT user_id, MAX(cnt) AS max_7day_posts, avg_weekly_posts
109+
FROM
110+
P
111+
JOIN T USING (user_id)
112+
GROUP BY 1
113+
HAVING max_7day_posts >= avg_weekly_posts * 2
114+
ORDER BY 1;
115+
```
85116

117+
```python
118+
import pandas as pd
119+
120+
def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
121+
# 计算每个用户在7天窗口内发布的帖子数
122+
p = posts.merge(posts, on='user_id')
123+
p = p[(p['post_date_y'] >= p['post_date_x']) &
124+
(p['post_date_y'] <= p['post_date_x'] + pd.Timedelta(days=6))]
125+
p_count = p.groupby(['user_id', 'post_date_x']).size().reset_index(name='cnt')
126+
127+
# 计算每个用户在2024年2月期间的平均每周发布的帖子数
128+
t = posts[(posts['post_date'] >= '2024-02-01') &
129+
(posts['post_date'] <= '2024-02-28')]
130+
t_count = t.groupby('user_id').size().reset_index(name='count')
131+
t_count['avg_weekly_posts'] = t_count['count'] / 4
132+
133+
# 合并两个计算出的表,并过滤符合条件的用户
134+
merged_df = p_count.merge(t_count, on='user_id')
135+
merged_df = merged_df.groupby('user_id').agg(max_7day_posts=('cnt', 'max'),
136+
avg_weekly_posts=('avg_weekly_posts', 'first'))
137+
result_df = merged_df[merged_df['max_7day_posts'] >= merged_df['avg_weekly_posts'] * 2].reset_index()
138+
139+
return result_df.sort_values('user_id')
86140
```
87141

88142
<!-- tabs:end -->

solution/3000-3099/3089.Find Bursty Behavior/README_EN.md

+62-1
Original file line numberDiff line numberDiff line change
@@ -75,12 +75,73 @@ Each row of this table contains post_id, user_id, and post_date.
7575

7676
## Solutions
7777

78-
### Solution 1
78+
### Solution 1: Self-Join + Group Count
79+
80+
We can use self-join to connect the `Posts` table with itself. The connection condition is `p1.user_id = p2.user_id` and `p2.post_date` is between `p1.post_date` and 6 days after `p1.post_date`. Then we group the connection results by `p1.user_id` and `p1.post_date` to count the number of posts for each user within 7 days of each day. We save this result in table `P`.
81+
82+
Next, we count the average number of posts per week for each user in February 2024 and save it in table `T`. Note that we need to find records where `post_date` is between `2024-02-01` and `2024-02-28`, group the records by `user_id`, then count the number of posts for each user, and finally divide by `4` to get the average number of posts per week. We save this result in table `T`.
83+
84+
Finally, we connect tables `P` and `T` with the condition `P.user_id = T.user_id`, then group by `user_id` to count the maximum number of posts within 7 days for each user. We then filter out records that meet the condition `max_7day_posts >= avg_weekly_posts * 2` to get the result. Note that we need to sort in ascending order by `user_id`.
7985

8086
<!-- tabs:start -->
8187

8288
```sql
89+
# Write your MySQL query statement below
90+
WITH
91+
P AS (
92+
SELECT p1.user_id AS user_id, COUNT(1) AS cnt
93+
FROM
94+
Posts AS p1
95+
JOIN Posts AS p2
96+
ON p1.user_id = p2.user_id
97+
AND p2.post_date BETWEEN p1.post_date AND DATE_ADD(p1.post_date, INTERVAL 6 DAY)
98+
GROUP BY p1.user_id, p1.post_date
99+
),
100+
T AS (
101+
SELECT user_id, COUNT(1) / 4 AS avg_weekly_posts
102+
FROM Posts
103+
WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
104+
GROUP BY 1
105+
)
106+
SELECT user_id, MAX(cnt) AS max_7day_posts, avg_weekly_posts
107+
FROM
108+
P
109+
JOIN T USING (user_id)
110+
GROUP BY 1
111+
HAVING max_7day_posts >= avg_weekly_posts * 2
112+
ORDER BY 1;
113+
```
83114

115+
```python
116+
import pandas as pd
117+
118+
119+
def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
120+
# Calculate the count of posts made by each user within a 7-day window
121+
p = posts.merge(posts, on="user_id")
122+
p = p[
123+
(p["post_date_y"] >= p["post_date_x"])
124+
& (p["post_date_y"] <= p["post_date_x"] + pd.Timedelta(days=6))
125+
]
126+
p_count = p.groupby(["user_id", "post_date_x"]).size().reset_index(name="cnt")
127+
128+
# Calculate the average weekly posts for each user in February 2024
129+
t = posts[
130+
(posts["post_date"] >= "2024-02-01") & (posts["post_date"] <= "2024-02-28")
131+
]
132+
t_count = t.groupby("user_id").size().reset_index(name="count")
133+
t_count["avg_weekly_posts"] = t_count["count"] / 4
134+
135+
# Joining the two calculated tables and filtering users meeting the criteria
136+
merged_df = p_count.merge(t_count, on="user_id")
137+
merged_df = merged_df.groupby("user_id").agg(
138+
max_7day_posts=("cnt", "max"), avg_weekly_posts=("avg_weekly_posts", "first")
139+
)
140+
result_df = merged_df[
141+
merged_df["max_7day_posts"] >= merged_df["avg_weekly_posts"] * 2
142+
].reset_index()
143+
144+
return result_df.sort_values("user_id")
84145
```
85146

86147
<!-- tabs:end -->
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import pandas as pd
2+
3+
4+
def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
5+
# Calculate the count of posts made by each user within a 7-day window
6+
p = posts.merge(posts, on="user_id")
7+
p = p[
8+
(p["post_date_y"] >= p["post_date_x"])
9+
& (p["post_date_y"] <= p["post_date_x"] + pd.Timedelta(days=6))
10+
]
11+
p_count = p.groupby(["user_id", "post_date_x"]).size().reset_index(name="cnt")
12+
13+
# Calculate the average weekly posts for each user in February 2024
14+
t = posts[
15+
(posts["post_date"] >= "2024-02-01") & (posts["post_date"] <= "2024-02-28")
16+
]
17+
t_count = t.groupby("user_id").size().reset_index(name="count")
18+
t_count["avg_weekly_posts"] = t_count["count"] / 4
19+
20+
# Joining the two calculated tables and filtering users meeting the criteria
21+
merged_df = p_count.merge(t_count, on="user_id")
22+
merged_df = merged_df.groupby("user_id").agg(
23+
max_7day_posts=("cnt", "max"), avg_weekly_posts=("avg_weekly_posts", "first")
24+
)
25+
result_df = merged_df[
26+
merged_df["max_7day_posts"] >= merged_df["avg_weekly_posts"] * 2
27+
].reset_index()
28+
29+
return result_df.sort_values("user_id")
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Write your MySQL query statement below
2+
WITH
3+
P AS (
4+
SELECT p1.user_id AS user_id, COUNT(1) AS cnt
5+
FROM
6+
Posts AS p1
7+
JOIN Posts AS p2
8+
ON p1.user_id = p2.user_id
9+
AND p2.post_date BETWEEN p1.post_date AND DATE_ADD(p1.post_date, INTERVAL 6 DAY)
10+
GROUP BY p1.user_id, p1.post_date
11+
),
12+
T AS (
13+
SELECT user_id, COUNT(1) / 4 AS avg_weekly_posts
14+
FROM Posts
15+
WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
16+
GROUP BY 1
17+
)
18+
SELECT user_id, MAX(cnt) AS max_7day_posts, avg_weekly_posts
19+
FROM
20+
P
21+
JOIN T USING (user_id)
22+
GROUP BY 1
23+
HAVING max_7day_posts >= avg_weekly_posts * 2
24+
ORDER BY 1;

0 commit comments

Comments
 (0)