Quantcast
Viewing latest article 22
Browse Latest Browse All 40

BigQuery SQL - a way to pass values from more than one row and more than one column to User Defined Function

I want to create a User Defined Function, (CREATE TEMPORARY FUNCTION) in BigQuery Standard SQL which will accept values aggregated from a bunch of rows.

My schema and table is similar to this:

| c1    | c2    | c3    | c4    |
|=======|=======|=======|=======|
| 1     | 1-1   | 3A    | 4A    |
| 1     | 1-1   | 3B    | 4B    |
| 1     | 1-1   | 3C    | 4C    |
| 1     | 1-2   | 3D    | 4D    |
| 2     | 2-1   | 3E    | 4E    |
| 2     | 2-1   | 3F    | 4F    |
| 2     | 2-2   | 3G    | 4G    |
| 2     | 2-2   | 3H    | 4H    |

I can't change the original schema to be made of nested or ARRAY fields.

I want to group by c1 and by c2 and pass values of c3 and c4 to a function, while being able to match between values from c3 and c4 for each row. One way of doing so is using ARRAY_AGG and pass values as an Array, but ARRAY_AGG is non-deterministic so values from c3 and c4 might come with different orders than the source table. Example:

CREATE TEMPORARY FUNCTION
    tempConcatStrFunction(c3 ARRAY<STRING>, c4 ARRAY<STRING>)
    RETURNS STRING
    LANGUAGE js AS """
       return
        c3
        .map((item, index) => [ item, c4[index] ].join(','))
        .join(',');
    """;
WITH T as (
    SELECT c1, c2, ARRAY_AGG(c3) as c3, ARRAY_AGG(c4) as c4
    GROUP BY c1, c2
)
SELECT c1, c2, tempConcatStrFunction(c3, c4) as str from T

The result should be:

| c1    | c2    | str                  |
|=======|=======|======================|
| 1     | 1-1   | 3A,4A,3B,4B,3C,4C    |
| 1     | 1-2   | 3D,4D                |
| 2     | 2-1   | 3E,4E,3F,4F          |
| 2     | 2-2   | 3G,4G,3H,4H          |

Any ideas how to achieve such results?


Viewing latest article 22
Browse Latest Browse All 40

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>