Skip to content

Commit ccfa2b9

Browse files
author
Sreeharsha Ramanavarapu
committed
Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN
GROUP_CONCAT Issue: ------ The problem occurs when: 1) GROUP_CONCAT (DISTINCT ....) is used in the query. 2) Data size greater than value of system variable: tmp_table_size. The result would contain values that are non-unique. Root cause: ----------- An in-memory structure is used to filter out non-unique values. When the data size exceeds tmp_table_size, the overflow is written to disk as a separate file. The expectation here is that when all such files are merged, the full set of unique values can be obtained. But the Item_func_group_concat::add function is in a bit of hurry. Even as it is adding values to the tree, it wants to decide if a value is unique and write it to the result buffer. This works fine if the configured maximum size is greater than the size of the data. But since tmp_table_size is set to a low value, the size of the tree is smaller and hence requires the creation of multiple copies on disk. Item_func_group_concat currently has no mechanism to merge all the copies on disk and then generate the result. This results in duplicate values. Solution: --------- In case of the DISTINCT clause, don't write to the result buffer immediately. Do the merge and only then put the unique values in the result buffer. This has be done in Item_func_group_concat::val_str. Note regarding result file changes: ----------------------------------- Earlier when a unique value was seen in Item_func_group_concat::add, it was dumped to the output. So result is in the order stored in SE. But with this fix, we wait until all the data is read and the final set of unique values are written to output buffer. So the data appears in the sorted order.
1 parent 25b352d commit ccfa2b9

File tree

5 files changed

+87
-35
lines changed

5 files changed

+87
-35
lines changed

mysql-test/r/func_gconcat.result

+46-20
Original file line numberDiff line numberDiff line change
@@ -367,8 +367,8 @@ bb,ccc,a,bb,ccc
367367
BB,CCC,A,BB,CCC
368368
select group_concat(distinct b) from t1 group by a;
369369
group_concat(distinct b)
370-
bb,ccc,a
371-
BB,CCC,A
370+
a,bb,ccc
371+
A,BB,CCC
372372
select group_concat(b order by b) from t1 group by a;
373373
group_concat(b order by b)
374374
a,bb,bb,ccc,ccc
@@ -387,11 +387,11 @@ Warning 1260 Row 2 was cut by GROUP_CONCAT()
387387
Warning 1260 Row 4 was cut by GROUP_CONCAT()
388388
select group_concat(distinct b) from t1 group by a;
389389
group_concat(distinct b)
390-
bb,c
391-
BB,C
390+
a,bb
391+
A,BB
392392
Warnings:
393-
Warning 1260 Row 2 was cut by GROUP_CONCAT()
394-
Warning 1260 Row 4 was cut by GROUP_CONCAT()
393+
Warning 1260 Row 3 was cut by GROUP_CONCAT()
394+
Warning 1260 Row 6 was cut by GROUP_CONCAT()
395395
select group_concat(b order by b) from t1 group by a;
396396
group_concat(b order by b)
397397
a,bb
@@ -417,8 +417,8 @@ bb,ccc,a,bb,ccc,1111111111111111111111111111111111111111111111111111111111111111
417417
BB,CCC,A,BB,CCC,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
418418
select group_concat(distinct b) from t1 group by a;
419419
group_concat(distinct b)
420-
bb,ccc,a,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
421-
BB,CCC,A,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
420+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,a,bb,ccc
421+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,A,BB,CCC
422422
select group_concat(b order by b) from t1 group by a;
423423
group_concat(b order by b)
424424
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,a,bb,bb,ccc,ccc
@@ -437,11 +437,11 @@ Warning 1260 Row 7 was cut by GROUP_CONCAT()
437437
Warning 1260 Row 14 was cut by GROUP_CONCAT()
438438
select group_concat(distinct b) from t1 group by a;
439439
group_concat(distinct b)
440-
bb,ccc,a,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
441-
BB,CCC,A,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
440+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
441+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
442442
Warnings:
443-
Warning 1260 Row 5 was cut by GROUP_CONCAT()
444-
Warning 1260 Row 10 was cut by GROUP_CONCAT()
443+
Warning 1260 Row 2 was cut by GROUP_CONCAT()
444+
Warning 1260 Row 4 was cut by GROUP_CONCAT()
445445
select group_concat(b order by b) from t1 group by a;
446446
group_concat(b order by b)
447447
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
@@ -526,9 +526,9 @@ a group_concat(b)
526526
NULL 3,4,2,1,2,7,3,3
527527
select a, group_concat(distinct b) from t1 group by a with rollup;
528528
a group_concat(distinct b)
529-
1 3,4,2,1
530-
2 7,3
531-
NULL 3,4,2,1,7
529+
1 1,2,3,4
530+
2 3,7
531+
NULL 1,2,3,4,7
532532
select a, group_concat(b order by b) from t1 group by a with rollup;
533533
a group_concat(b order by b)
534534
1 1,2,2,3,4
@@ -751,10 +751,10 @@ CREATE TABLE t1(a TEXT, b CHAR(20));
751751
INSERT INTO t1 VALUES ("one.1","one.1"),("two.2","two.2"),("one.3","one.3");
752752
SELECT GROUP_CONCAT(DISTINCT UCASE(a)) FROM t1;
753753
GROUP_CONCAT(DISTINCT UCASE(a))
754-
ONE.1,TWO.2,ONE.3
754+
ONE.1,ONE.3,TWO.2
755755
SELECT GROUP_CONCAT(DISTINCT UCASE(b)) FROM t1;
756756
GROUP_CONCAT(DISTINCT UCASE(b))
757-
ONE.1,TWO.2,ONE.3
757+
ONE.1,ONE.3,TWO.2
758758
DROP TABLE t1;
759759
CREATE TABLE t1( a VARCHAR( 10 ), b INT );
760760
INSERT INTO t1 VALUES ( repeat( 'a', 10 ), 1),
@@ -853,7 +853,7 @@ create table t1(a bit(2) not null);
853853
insert into t1 values (1), (0), (0), (3), (1);
854854
select group_concat(distinct a) from t1;
855855
group_concat(distinct a)
856-
1,0,3
856+
0,1,3
857857
select group_concat(distinct a order by a) from t1;
858858
group_concat(distinct a order by a)
859859
0,1,3
@@ -866,13 +866,13 @@ insert into t1 values (1, 'a', 0), (0, 'b', 1), (0, 'c', 0), (3, 'd', 1),
866866
(1, 'e', 1), (3, 'f', 1), (0, 'g', 1);
867867
select group_concat(distinct a, c) from t1;
868868
group_concat(distinct a, c)
869-
10,01,00,31,11
869+
00,01,10,11,31
870870
select group_concat(distinct a, c order by a) from t1;
871871
group_concat(distinct a, c order by a)
872872
00,01,11,10,31
873873
select group_concat(distinct a, c) from t1;
874874
group_concat(distinct a, c)
875-
10,01,00,31,11
875+
00,01,10,11,31
876876
select group_concat(distinct a, c order by a, c) from t1;
877877
group_concat(distinct a, c order by a, c)
878878
00,01,10,11,31
@@ -1176,3 +1176,29 @@ Warning 1260 Row 5 was cut by GROUP_CONCAT()
11761176
DROP TABLE t1;
11771177
SET group_concat_max_len= DEFAULT;
11781178
End of 5.6 tests
1179+
#
1180+
# Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN GROUP_CONCAT
1181+
#
1182+
CREATE TABLE t1 (a VARCHAR(1000), b INT);
1183+
INSERT INTO t1 VALUES ('a', 1), ('b', 2), ('a', 3), ('b', 5), ('c', 5);
1184+
SELECT GROUP_CONCAT(DISTINCT a) FROM t1;
1185+
GROUP_CONCAT(DISTINCT a)
1186+
a,b,c
1187+
SELECT GROUP_CONCAT(DISTINCT a), b FROM t1 GROUP BY b;
1188+
GROUP_CONCAT(DISTINCT a) b
1189+
a 1
1190+
b 2
1191+
a 3
1192+
b,c 5
1193+
SET @@tmp_table_size=1024;
1194+
SELECT GROUP_CONCAT(DISTINCT a) FROM t1;
1195+
GROUP_CONCAT(DISTINCT a)
1196+
a,b,c
1197+
SELECT GROUP_CONCAT(DISTINCT a), b FROM t1 GROUP BY b;
1198+
GROUP_CONCAT(DISTINCT a) b
1199+
a 1
1200+
b 2
1201+
a 3
1202+
b,c 5
1203+
SET @@tmp_table_size=default;
1204+
DROP TABLE t1;

mysql-test/suite/json/r/json_group_concat_innodb.result

+2-2
Original file line numberDiff line numberDiff line change
@@ -84,12 +84,12 @@ insert into t values (cast(7 as json), '7'), (cast(2 as json), '2');
8484
insert into t values (cast(7 as json), '7'), (cast(2 as json), '2');
8585
select group_concat(j), group_concat(distinct j), group_concat(c) from t;
8686
group_concat(j) group_concat(distinct j) group_concat(c)
87-
[1, 2, 3],7,2,7,2 [1, 2, 3],7,2 [a,b,c],7,2,7,2
87+
[1, 2, 3],7,2,7,2 [1, 2, 3],2,7 [a,b,c],7,2,7,2
8888
select group_concat(j order by j), group_concat(distinct j order by j), group_concat(c order by c) from t;
8989
group_concat(j order by j) group_concat(distinct j order by j) group_concat(c order by c)
9090
[1, 2, 3],2,2,7,7 [1, 2, 3],2,7 2,2,7,7,[a,b,c]
9191
insert into t values (NULL, NULL);
9292
select group_concat(j), group_concat(distinct j), group_concat(c) from t;
9393
group_concat(j) group_concat(distinct j) group_concat(c)
94-
[1, 2, 3],7,2,7,2 [1, 2, 3],7,2 [a,b,c],7,2,7,2
94+
[1, 2, 3],7,2,7,2 ,, [a,b,c],7,2,7,2
9595
drop table t;

mysql-test/t/func_gconcat.test

+20
Original file line numberDiff line numberDiff line change
@@ -870,3 +870,23 @@ DROP TABLE t1;
870870
SET group_concat_max_len= DEFAULT;
871871

872872
--echo End of 5.6 tests
873+
874+
--echo #
875+
--echo # Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN GROUP_CONCAT
876+
--echo #
877+
878+
CREATE TABLE t1 (a VARCHAR(1000), b INT);
879+
INSERT INTO t1 VALUES ('a', 1), ('b', 2), ('a', 3), ('b', 5), ('c', 5);
880+
881+
let query1= SELECT GROUP_CONCAT(DISTINCT a) FROM t1;
882+
let query2= SELECT GROUP_CONCAT(DISTINCT a), b FROM t1 GROUP BY b;
883+
884+
eval $query1;
885+
eval $query2;
886+
887+
SET @@tmp_table_size=1024;
888+
eval $query1;
889+
eval $query2;
890+
891+
SET @@tmp_table_size=default;
892+
DROP TABLE t1;

sql/item_sum.cc

+16-11
Original file line numberDiff line numberDiff line change
@@ -3189,8 +3189,8 @@ int dump_leaf_key(void* key_arg, element_count count MY_ATTRIBUTE((unused)),
31893189
Item **arg= item->args, **arg_end= item->args + item->arg_count_field;
31903190
size_t old_length= result->length();
31913191

3192-
if (item->no_appended)
3193-
item->no_appended= FALSE;
3192+
if (!item->m_result_finalized)
3193+
item->m_result_finalized= true;
31943194
else
31953195
result->append(*item->separator);
31963196

@@ -3467,7 +3467,7 @@ void Item_func_group_concat::clear()
34673467
result.copy();
34683468
null_value= TRUE;
34693469
warning_for_row= FALSE;
3470-
no_appended= TRUE;
3470+
m_result_finalized= false;
34713471
if (tree)
34723472
reset_tree(tree);
34733473
if (unique_filter)
@@ -3524,12 +3524,10 @@ bool Item_func_group_concat::add()
35243524
return 1;
35253525
}
35263526
/*
3527-
If the row is not a duplicate (el->count == 1)
3528-
we can dump the row here in case of GROUP_CONCAT(DISTINCT...)
3529-
instead of doing tree traverse later.
3527+
In case of GROUP_CONCAT with DISTINCT or ORDER BY (or both) don't dump the
3528+
row to the output buffer here. That will be done in val_str.
35303529
*/
3531-
if (row_eligible && !warning_for_row &&
3532-
(!tree || (el->count == 1 && distinct && !arg_count_order)))
3530+
if (row_eligible && !warning_for_row && tree == NULL && !distinct)
35333531
dump_leaf_key(table->record[0] + table->s->null_bytes, 1, this);
35343532

35353533
return 0;
@@ -3739,9 +3737,16 @@ String* Item_func_group_concat::val_str(String* str)
37393737
DBUG_ASSERT(fixed == 1);
37403738
if (null_value)
37413739
return 0;
3742-
if (no_appended && tree)
3743-
/* Tree is used for sorting as in ORDER BY */
3744-
tree_walk(tree, &dump_leaf_key, this, left_root_right);
3740+
3741+
if (!m_result_finalized) // Result yet to be written.
3742+
{
3743+
if (tree != NULL) // order by
3744+
tree_walk(tree, &dump_leaf_key, this, left_root_right);
3745+
else if (distinct) // distinct (and no order by).
3746+
unique_filter->walk(&dump_leaf_key, this);
3747+
else
3748+
DBUG_ASSERT(false); // Can't happen
3749+
}
37453750

37463751
if (table && table->blob_storage &&
37473752
table->blob_storage->is_truncated_value())

sql/item_sum.h

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#ifndef ITEM_SUM_INCLUDED
22
#define ITEM_SUM_INCLUDED
33

4-
/* Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
4+
/* Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
55
66
This program is free software; you can redistribute it and/or modify
77
it under the terms of the GNU General Public License as published by
@@ -1433,7 +1433,8 @@ class Item_func_group_concat : public Item_sum
14331433
bool warning_for_row;
14341434
bool always_null;
14351435
bool force_copy_fields;
1436-
bool no_appended;
1436+
/** True if result has been written to output buffer. */
1437+
bool m_result_finalized;
14371438
/*
14381439
Following is 0 normal object and pointer to original one for copy
14391440
(to correctly free resources)

0 commit comments

Comments
 (0)