Skip to content

Error when saving a dataframe to Redshift (java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda) #459

@marek-babic

Description

@marek-babic

Hi there

I'm using this package io.github.spark-redshift-community:spark-redshift_2.12:4.2.0 as a dependency in the context of AWS EMR job trying to save a dataframe to Redshift.

Sadly this attempt fails with following stacktrace:
https://gist.github.com/marek-babic/0110160bdd0ba11533b6f425559d2f1c

I know that the dataframe is in healthy state as show() and printSchema() output what I expect and the schema matches the one from Redshift table.

The code looks like so (where the capital letter vars are set appropriately):

df.write \
  .format("io.github.spark_redshift_community.spark.redshift") \
  .option("url", "jdbc:redshift://" + HOST_URL + ":5439/" + DATABASE_NAME) \
  .option("user", USERNAME) \
  .option("password", PASSWORD) \
  .option("dbtable", TABLE_NAME) \
  .option("aws_region", REGION) \
  .option("aws_iam_role", IAM_ROLE) \
  .option("tempdir", TMP_PATH) \
  .option("tempformat", "CSV") \
  .mode("overwrite") \
  .save()

I tried to save the dataframe to S3 just by running:

df.write.format("csv").save(TMP_PATH + "/test1")

which worked, so the permissions in AWS are correct.

Any ideas why this could be happening?
Thanks
Marek

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions