Datetime Functions
-
Current Date/Timestamp
-
Cast to timestamp
-
Format datetimes
-
Extract from timestamp
-
Convert to date
-
Manipulate datetimes
Methods
-
Column :
cast
-
Built-In Functions :
date_format
,to_date
,date_add
,year
,month
,dayofweek
,minute
,second
Current Date/Timestamp
current_date
,current_timestamp
-
current_date() 获取当前日期:年月日
-
current_timestamp() 获取当前时间:年月日时分秒毫秒
Cast to Timestamp
cast
Casts column to a different data type, specified using string representation or DataType.
int
from pyspark.sql.functions import col
df = spark.read.parquet(eventsPath)\
.select("user_id", col("event_timestamp").alias("timestamp"))
display(df)
timestamp
timestampDF = df\
.withColumn("timestamp",\
(col("timestamp") / 1e6).cast("timestamp"))
display(timestampDF)
TimestampType(推荐)
from pyspark.sql.types import TimestampType
timestampDF = df.withColumn("timestamp", \
(col("timestamp") / 1e6).cast(TimestampType()))
display(timestampDF)
Format date
date_format:将日期格式化为文本
from pyspark.sql.functions import date_format
formattedDF = (timestampDF.withColumn("date string", date_format("timestamp", "yyyy-MM-dd"))
.withColumn("time string", date_format("timestamp", "HH:mm:ss.SSSSSS"))
)
display(formattedDF)
Convert to Date
to_date:将一列转换为日期
from pyspark.sql.functions import to_date
dateDF = timestampDF.withColumn("date", to_date(col("timestamp")))
display(dateDF)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mIMcert5-1650534283741)(resource/image/image_oZMuHxyyjWRHH7BgcJBnkq.png)]
Extract datetime attribute from timestamp
year
,month
,days
,dayofweek
,dayofmonth
,dayofyear
,hour
,minute
,second
,etc
from pyspark.sql.functions import year, month, dayofweek , hour, minute, second
datetimeDF = timestampDF.withColumn("year", year(col("timestamp")))\
.withColumn("month", month(col("timestamp")))\
.withColumn("dayofweek", dayofweek(col("timestamp")))\
.withColumn("hour", hour(col("timestamp")))\
.withColumn("minute", minute(col("timestamp")))\
.withColumn("second", second(col("timestamp")))\
display(datetimeDF)
Manipulate Datetimes
date_add
,add_months
,date_sub
,datediff
,etc
Returns the date that is the given number of days after start
from pyspark.sql.functions import date_add
plus2DF = timestampDF.withColumn("plus_two_days", date_add(col("timestamp"), 2))
display(plus2DF)