Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
431 views
in Technique[技术] by (71.8m points)

apache spark - How would I perform a join in Scala based on whether one OR another column match the case?

Say I have df1 as this:

var df1 = Seq("a","b","c","d").toDF("letter")

enter image description here

and df2 as this:

var df2 = Seq(("a","1"),("q","2"),("x","c"),("d","z")).toDF("col1","col2")

enter image description here

I want to merge the two so that it returns the rows where EITHER col1 or col2 of df2 match the corresponding row of df1.

So the resulting dataframe should look like this: (Since row 2 of df2 doesn't have a "b" in it, it isn't returned in the resulting dataframe)

enter image description here

Thanks so much, have a great day and a happy new year!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use a semi join. It returns rows in df2 which satisfies the given condition.

val result = df2.join(
    df1,
    (df2.col("col1") === df1.col("letter")) || 
    (df2.col("col2") === df1.col("letter")),
    "semi"
)

result.show
+----+----+
|col1|col2|
+----+----+
|   a|   1|
|   x|   c|
|   d|   z|
+----+----+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...