BigQuery

Quick Tips

Command-Line Tool

For help on a specific command. bq help $COMMAND, e.g. bq help extract

Export a table as a TSV file into GS
bq extract --noprint_header -F "\t" sohappy_tmp.tmp gs://${BUCKET_NAME}/${FILENAME}
Execute a query and save result into another table.
bq query --allow_large_results -n 0 --replace=true --destination_table 'tmp.tmp' 'select logdt,logbody from glog.apir_tdagent_20170510'
Load data from TSV file with Command-Line Tool
  • Tips - It is efficiently to Load TSV from GS instead of local file.
bq load -F '\t' ${DATASET}.${TABLE} gs://${FILE_FULL_PATH} ${FIELD}:${TYPE},...

Ex.

bq load -F '\t' sohappy_tmp.gid2terms gs://i2terms.out gid:string,term:string,position:integer,score:float

Convert a date string to a TIMESTAMP data type

SELECT TIMESTAMP("2013-10-26 21:08:14");

returns 2013-10-26 21:08:14 UTC

Split a string into mutiple columns

SELECT 
  NTH(1, SPLIT(reclog, '_')),
  NTH(2, SPLIT(reclog, '_')),
  NTH(3, SPLIT(reclog, '_')) 
FROM 
  (SELECT "recomd-api-nmq1_1454574955_25861" as reclog)
;

Ex.

Export/Write table to GS with TSV format

bq extract -F '\t' nono.cap_gte_20160301 gs://ven-cust-nono/cap_gte_20160301.tsv

Extract data from JSON

SELECT   
  JSON_EXTRACT(json, '$.action') as act,
  JSON_EXTRACT(json, '$.uid') as uid,
  JSON_EXTRACT(json, '$.autosend[1]') as send
FROM (
  SELECT '{"action":"pageload","autosend":[false, true],"uid":"3SuhP6ctjUY="}' as json
);

ex.

Resources exceeded during query execution (avoid SELECT * )

Symptom

Solution
  1. Separate query to the other sub table, e.g. t1.
  2. Join back to major table with primary key.

JOIN (including semi-join) and UNION ALL (comma, date range) may not be combined in a single SELECT ...

Move table range query (UNION ALL) to an inner query

SELECT
  *
FROM (
  SELECT
    *
  FROM
    TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16')))
WHERE
  ven_session IN (
  SELECT
    ven_session
  FROM
    TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
  WHERE
    gid = '4944951'
  GROUP BY
    ven_session
  LIMIT
    1000 )

instead of

SELECT
  *
FROM
  TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
WHERE
  ven_session IN (
  SELECT
    ven_session
  FROM
    TABLE_DATE_RANGE(weblog_, TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-16'))
  WHERE
    gid = '4944951'
  GROUP BY
    ven_session
  LIMIT
    1000 )

Error: Reference to field [$FIELD_NAME] with conflicting types

Symptom
SELECT
  *
FROM
  TABLE_DATE_RANGE(...)
LIMIT
  1000

Solution

Adjust table query range within the tables of identical schema.

Cannot join on repeated field

Reference

results matching ""

    No results matching ""