Spark – join transformation

join—Equivalent to an inner join in RDBMS, this returns a new pair RDD with the elements (K, (V, W)) containing all possible pairs of values from the first and second RDDs that have the same keys. For the keys that exist in only one of the two RDDs, the resulting RDD will have no elements. 


leftOuterJoin—Instead of (K, (V, W)), this returns elements of type (K, (V, Option(W))). The resulting RDD will also contain the elements (key, (v, None)) for those keys that don’t exist in the second RDD. Keys that exist only in the second RDD will have no matching elements in the new RDD. 


rightOuterJoin—This returns elements of type (K, (Option(V), W)); the resulting RDD will also contain the elements (key, (None, w)) for those keys that don’t exist in the first RDD. Keys that exist only in the first RDD will have no matching elements in the new RDD. 


fullOuterJoin—This returns elements of type (K, (Option(V), Option(W)); the resulting RDD will contain both (key, (v, None)) and (key, (None, w)) elements for those keys that exist in only one of the two RDDs. 


Advertisements
This entry was posted in Programming, Spark. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s