DataFrame FAQs This FAQ addresses common use cases and example usage using the available APIs. For more detailed API descriptions, see the PySpark documentation . How can I get better performance with DataFrame UDFs? If the functionality exists in the available built-in functions, using these will perform better. Example usage below. Also see the pyspark.sql.function documentation . We use the built-in functions and the withColumn() API to add new columns. We could have also used withColumnRenamed() to replace an existing column after the transformation. Copy from pyspark.sql import functions as F from pyspark.sql.types import * # Build an example DataFrame dataset to work with. dbutils . fs . rm ( "/tmp/dataframe_sample.csv" , True ) dbutils . fs . put ( "/tmp/dataframe_sample.csv" , """id|end_date|start_date|location 1|2015-10-14 00:00:00|2015-09-14 00:00:00|CA-SF 2|2015-10-15 01:00:20|2015-08-14 00:00:00|CA-SD 3...