BigQuery
Quick Tips
Command-Line Tool
For help on a specific command.
bq help $COMMAND
, e.g. bq help extract
Export a table as a TSV file into GS
bq extract --noprint_header -F "\t" sohappy_tmp.tmp gs://${BUCKET_NAME}/${FILENAME}
Execute a query and save result into another table.
bq query --allow_large_results -n 0 --replace=true --destination_table 'tmp.tmp' 'select logdt,logbody from glog.apir_tdagent_20170510'
Load data from TSV file with Command-Line Tool
- Tips - It is efficiently to Load TSV from GS instead of local file.
bq load -F '\t' ${DATASET}.${TABLE} gs://${FILE_FULL_PATH} ${FIELD}:${TYPE},...
Ex.
bq load -F '\t' sohappy_tmp.gid2terms gs://i2terms.out gid:string,term:string,position:integer,score:float
Convert a date string to a TIMESTAMP data type
SELECT TIMESTAMP("2013-10-26 21:08:14");
returns
2013-10-26 21:08:14 UTC
Split a string into mutiple columns
SELECT
NTH(1, SPLIT(reclog, '_')),
NTH(2, SPLIT(reclog, '_')),
NTH(3, SPLIT(reclog, '_'))
FROM
(SELECT "recomd-api-nmq1_1454574955_25861" as reclog)
;
Ex.
Export/Write table to GS with TSV format
bq extract -F '\t' nono.cap_gte_20160301 gs://ven-cust-nono/cap_gte_20160301.tsv
Extract data from JSON
SELECT
JSON_EXTRACT(json, '$.action') as act,
JSON_EXTRACT(json, '$.uid') as uid,
JSON_EXTRACT(json, '$.autosend[1]') as send
FROM (
SELECT '{"action":"pageload","autosend":[false, true],"uid":"3SuhP6ctjUY="}' as json
);
ex.
Resources exceeded during query execution (avoid SELECT * )
Symptom
Solution
- Separate query to the other sub table, e.g. t1.
- Join back to major table with primary key.
JOIN (including semi-join) and UNION ALL (comma, date range) may not be combined in a single SELECT ...
Move table range query (UNION ALL) to an inner query
SELECT
*
FROM (
SELECT
*
FROM
TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16')))
WHERE
ven_session IN (
SELECT
ven_session
FROM
TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
WHERE
gid = '4944951'
GROUP BY
ven_session
LIMIT
1000 )
instead of
SELECT
*
FROM
TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
WHERE
ven_session IN (
SELECT
ven_session
FROM
TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
WHERE
gid = '4944951'
GROUP BY
ven_session
LIMIT
1000 )
Error: Reference to field [$FIELD_NAME] with conflicting types
Symptom
SELECT
*
FROM
TABLE_DATE_RANGE(...)
LIMIT
1000
Solution
Adjust table query range within the tables of identical schema.
Cannot join on repeated field
- Flatten field for join operation