Skip to content

Commit fb1c820

Browse files
authored
Update README.md
1 parent da6796e commit fb1c820

File tree

1 file changed

+54
-2
lines changed

1 file changed

+54
-2
lines changed

README.md

+54-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,56 @@
1-
# spark-dataframe-gz-csv-read-issue
2-
Issue reading csv gz file Spark DataFrame
1+
## Issue reading csv gz file Spark DataFrame
32

43
https://github.com/databricks/spark-csv/issues/436
4+
5+
6+
### Workaround is to rename the column
7+
(inspired by https://stackoverflow.com/questions/34077353/how-to-change-dataframe-column-names-in-pyspark)
8+
9+
```python
10+
%pyspark
11+
12+
from functools import reduce
13+
14+
data = spark.createDataFrame([("Alberto", 2), ("Dakota", 2)],
15+
["Name", "askdaosdka"])
16+
data.show()
17+
data.printSchema()
18+
19+
oldColumns = data.schema.names
20+
print(oldColumns)
21+
22+
newColumns = oldColumns[:]
23+
newColumns[1] = "Age"
24+
25+
print(newColumns)
26+
27+
df = reduce(lambda data, idx: data.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), data)
28+
df.printSchema()
29+
df.show()
30+
```
31+
32+
```sh
33+
+-------+----------+
34+
| Name|askdaosdka|
35+
+-------+----------+
36+
|Alberto| 2|
37+
| Dakota| 2|
38+
+-------+----------+
39+
40+
root
41+
|-- Name: string (nullable = true)
42+
|-- askdaosdka: long (nullable = true)
43+
44+
['Name', 'askdaosdka']
45+
['Name', 'Age']
46+
root
47+
|-- Name: string (nullable = true)
48+
|-- Age: long (nullable = true)
49+
50+
+-------+---+
51+
| Name|Age|
52+
+-------+---+
53+
|Alberto| 2|
54+
| Dakota| 2|
55+
+-------+---+
56+
```

0 commit comments

Comments
 (0)