Skip to main content

Generating Databricks token



Generating Databricks token


Token-based authentication is enabled by default for all Databricks accounts launched after January 2018. If it is disabled, your administrator must enable it before you can perform the tasks described in this topic. See Enable Token-based Authentication.

Generate a token

This section describes how to generate a personal access token in the Databricks UI. You can also generate and revoke tokens using the Token API.
  1. Click the user profile icon User Profile in the upper right corner of your Databricks workspace.
  2. Click User Settings.
  3. Go to the Access Tokens tab.
    List_Tokens
  4. Click the Generate New Token button.
  5. Optionally enter a description (comment) and expiration period.
    Generate_Token
  6. Click the Generate button.
  7. Copy the generated token and store in a secure location.

Revoke a token

This section describes how to revoke personal access tokens using the Databricks UI. You can also generate and revoke access tokens using the Token API.
  1. Click the user profile icon User Profile in the upper right corner of your Databricks workspace.
  2. Click User Settings.
  3. Go to the Access Tokens tab.
  4. Click x for the token you want to revoke.
  5. On the Revoke Token dialog, click the Revoke Token button.

Use tokens for API authentication

Store token in .netrc file and use in curl

Create a .netrc file with machinelogin, and password properties:
Copy to clipboardCopy
machine <databricks-instance>
login token
password <personal-access-token-value>
Replace <databricks-instance> with the <account>.cloud.databricks.com domain name of your Databricks deployment. Replace <personal-access-token-value> with the value of your personal access token.
Important
You can optionally set login to your Databricks username and password to your Databricks password. However, we recommend that you use a personal access token to authenticate to an API endpoint. If you choose to use a username and password, do not use -u to pass your credentials. In other words, do not use curl -u <your-username>:<your-password> -X GET https://<databricks-instance>/api/2.0/token/list.
To invoke the .netrc file, use -n in your curl command:
Copy to clipboardCopy
curl -n -X GET https://<databricks-instance>/api/2.0/token/list

Pass token to Bearer authentication

You can include the token in the header using Bearer authentication. You can use this approach with curl or any client that you build.
Copy to clipboardCopy
curl 'https://<databricks-instance>/api/2.0/token/list' -X GET -H "Authorization: Bearer <personal-access-token-value>"

Comments

Popular posts from this blog

Learn GitHub

Learn GitHub git init git add file.txt git commit -m "my first commit" git remote add origin https://github.com/dansullivanma/devlops_data_sci.git git clone https://github.com/dansullivanma/devlops_data_sci.git

Garbage collection in Databricks

Clean up snapshots Delta Lake provides snapshot isolation for reads, which means that it is safe to run  OPTIMIZE  even while other users or jobs are querying the table. Eventually however, you should clean up old snapshots. You can do this by running the  VACUUM  command: VACUUM events You control the age of the latest retained snapshot by using the  RETAIN   <N>   HOURS  option: VACUUM events RETAIN 24 HOURS Test the garbage collection You can specify  DRY   RUN  to test the garbage collection and return a list of files to be deleted: VACUUM events DRY RUN Configure the retention threshold The  VACUUM  command removes any files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days, but you can specify an alternate retention interval. For example, to delete all stale files older t...

Z-Ordering

Z-Ordering in Databricks Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is automatically used by Delta Lake on Databricks data-skipping algorithms to dramatically reduce the amount of data that needs to be read. To Z-Order data, you specify the columns to order on in the  ZORDER   BY  clause: OPTIMIZE events WHERE date >= current_timestamp () - INTERVAL 1 day ZORDER BY ( eventType ) You can specify multiple columns for  ZORDER   BY  as a comma-separated list. However, the effectiveness of the locality drops with each additional column. Z-Ordering on columns that do not have statistics collected on them would be ineffective and a waste of resources as data skipping requires column-local stats such as min, max, and count. You can configure statistics collection on certain columns by re-ordering columns in the schema and/or increasing the number of columns to collect s...