Databricks read from redshift first selected column returns wrong value if quotes used 

Hi folks, 
when issuing a query against redshift from databricks there seems to be an issue with how quotes inside an sql string 
are handled. 

The following set up on Redshift: 

```
CREATE TABLE IF NOT EXISTS tmp_test
(
	my_new_varchar VARCHAR(128)   ENCODE lzo
	,some_text VARCHAR(256)   ENCODE lzo
	,some_decimal NUMERIC(12,6)   ENCODE az64
	,int_col INTEGER   ENCODE az64
	,real_col REAL   ENCODE RAW
	,double_col DOUBLE PRECISION   ENCODE RAW
	,smal_int_col SMALLINT   ENCODE az64
	,t_with_timezone TIMESTAMP WITH TIME ZONE   ENCODE az64
	,t_without_timezone TIMESTAMP WITHOUT TIME ZONE   ENCODE az64
	,dwh_created_at TIMESTAMP WITHOUT TIME ZONE  DEFAULT now() ENCODE az64
)
DISTSTYLE AUTO;


INSERT INTO tmp_test
(my_new_varchar, some_text, some_decimal, int_col, real_col, double_col, smal_int_col, t_with_timezone, t_without_timezone, dwh_created_at)
VALUES('first_val', 'second_val', 123.123, 123456, 12.12, 0.233234, 123, null, null, null);

commit;
```

on databricks I run the following command afterwards:

```
val query = """
 Select max('my_new_varchar') as column_name, max(coalesce(length(my_new_varchar),0)) as length_or_numeric_precision, max(0) as numeric_scale FROM tmp_test  union 
 Select max('some_text') as column_name, max(coalesce(length(some_text),0)) as length_or_numeric_precision, max(0) as numeric_scale FROM tmp_test  union 
 Select max('some_decimal') as column_name ,max(length(cast(some_decimal as varchar))) as length_or_numeric_precision, max(position('.' in some_decimal))  as numeric_scale FROM tmp_test  union 
 Select max('int_col') as column_name ,max(length(cast(int_col as varchar))) as length_or_numeric_precision, max(position('.' in int_col))  as numeric_scale FROM tmp_test  union 
 Select max('real_col') as column_name ,max(length(cast(real_col as varchar))) as length_or_numeric_precision, max(position('.' in real_col))  as numeric_scale FROM tmp_test  union 
 Select max('double_col') as column_name ,max(length(cast(double_col as varchar))) as length_or_numeric_precision, max(position('.' in double_col))  as numeric_scale FROM tmp_test  union 
 Select max('smal_int_col') as column_name ,max(length(cast(smal_int_col as varchar))) as length_or_numeric_precision, max(position('.' in smal_int_col))  as numeric_scale FROM tmp_test
""".stripMargin

val maxRedDataDF = spark.read
       .format("com.databricks.spark.redshift")
       .option("url", redUrl)
       .option("tempdir", tempdir)
       .option("forward_spark_s3_credentials", "true")
       .option("autoenablessl", "false")
       .option("query", query)
       .load()

display(maxRedDataDF)

``` 
As you can see for the column **my_new_varchar** it does not return the column_name as for the rest of the columns in the query but instead the max values inside this column e.g. **first_val** 

![image](https://user-images.githubusercontent.com/62884852/182361364-643eb472-c58b-437d-bed1-1fd566df94f5.png)

A dirty quick fix is to modify the query by prepanding the following line: 

```
Select max('''tester''') as column_name, max(0) as length_or_numeric_precision, max(0) as numeric_scale  union
```

Now the result for the column **my_new_varchar** is returned as expected. But column_name **tester**
now is returned as **tester'** (see single quotation mark at string ending) if you do not use triple quotation 
marks around **tester** but single once it gives error of column not found (Redshift) case of double quotation marks it gives sql exception (databricks). 

Hope the explanation is detailed enough, if not please let me know. 
Regards, 
Hans 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Databricks read from redshift first selected column returns wrong value if quotes used #467

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Databricks read from redshift first selected column returns wrong value if quotes used #467

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions