如题,原来的时间戳是STRING类型,(%d-%b-%Y %H.%M.%S.%f %p) 03-DEC-15 12.00.00.000000 AM一般的类型转换我可以用withColumn里面用CAST(‘double')但这种想转换成DATETIME类型的,或者INT,要怎么处理呢

解决方案 »

  1.   

    scala> spark.version
    res11: String = 2.0.2scala> val df = sc.parallelize(Seq("03-DEC-15 12.00.00.000000 AM")).toDF
    df: org.apache.spark.sql.DataFrame = [value: string]scala> df.show(false)
    +----------------------------+
    |value                       |
    +----------------------------+
    |03-DEC-15 12.00.00.000000 AM|
    +----------------------------+scala> val df2 = df.withColumn("dateType", unix_timestamp($"value", "dd-MMM-yy hh.mm.ss.SSSSSS a"))
    df2: org.apache.spark.sql.DataFrame = [value: string, dateType: bigint]scala> df2.show(false)
    +----------------------------+----------+
    |value                       |dateType  |
    +----------------------------+----------+
    |03-DEC-15 12.00.00.000000 AM|1449118800|
    +----------------------------+----------+scala> df2.withColumn("newFormat", from_unixtime($"dateType")).show(false)
    +----------------------------+----------+-------------------+
    |value                       |dateType  |newFormat          |
    +----------------------------+----------+-------------------+
    |03-DEC-15 12.00.00.000000 AM|1449118800|2015-12-03 00:00:00|
    +----------------------------+----------+-------------------+see the Java simpleDateFormatAnd Spark unix_timestamp UDF