I want to create a User Defined Function, (CREATE TEMPORARY FUNCTION
) in BigQuery Standard SQL which will accept values aggregated from a bunch of rows.
My schema and table is similar to this:
| c1 | c2 | c3 | c4 |
|=======|=======|=======|=======|
| 1 | 1-1 | 3A | 4A |
| 1 | 1-1 | 3B | 4B |
| 1 | 1-1 | 3C | 4C |
| 1 | 1-2 | 3D | 4D |
| 2 | 2-1 | 3E | 4E |
| 2 | 2-1 | 3F | 4F |
| 2 | 2-2 | 3G | 4G |
| 2 | 2-2 | 3H | 4H |
I can't change the original schema to be made of nested or ARRAY fields.
I want to group by c1
and by c2
and pass values of c3
and c4
to a function, while being able to match between values from c3
and c4
for each row.
One way of doing so is using ARRAY_AGG
and pass values as an Array
, but ARRAY_AGG
is non-deterministic so values from c3
and c4
might come with different orders than the source table.
Example
:
CREATE TEMPORARY FUNCTION
tempConcatStrFunction(c3 ARRAY<STRING>, c4 ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
return
c3
.map((item, index) => [ item, c4[index] ].join(','))
.join(',');
""";
WITH T as (
SELECT c1, c2, ARRAY_AGG(c3) as c3, ARRAY_AGG(c4) as c4
GROUP BY c1, c2
)
SELECT c1, c2, tempConcatStrFunction(c3, c4) as str from T
The result should be:
| c1 | c2 | str |
|=======|=======|======================|
| 1 | 1-1 | 3A,4A,3B,4B,3C,4C |
| 1 | 1-2 | 3D,4D |
| 2 | 2-1 | 3E,4E,3F,4F |
| 2 | 2-2 | 3G,4G,3H,4H |
Any ideas how to achieve such results?