pyspark.sql.DataFrameReader.jdbc¶
-
DataFrameReader.
jdbc
(url, table, column=None, lowerBound=None, upperBound=None, numPartitions=None, predicates=None, properties=None)[source]¶ Construct a
DataFrame
representing the database table namedtable
accessible via JDBC URLurl
and connectionproperties
.Partitions of the table will be retrieved in parallel if either
column
orpredicates
is specified.lowerBound`, ``upperBound
andnumPartitions
is needed whencolumn
is specified.If both
column
andpredicates
are specified,column
will be used.Note
Don’t create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
- Parameters
url – a JDBC URL of the form
jdbc:subprotocol:subname
table – the name of the table
column – the name of a column of numeric, date, or timestamp type that will be used for partitioning; if this parameter is specified, then
numPartitions
,lowerBound
(inclusive), andupperBound
(exclusive) will form partition strides for generated WHERE clause expressions used to split the columncolumn
evenlylowerBound – the minimum value of
column
used to decide partition strideupperBound – the maximum value of
column
used to decide partition stridenumPartitions – the number of partitions
predicates – a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the
DataFrame
properties – a dictionary of JDBC database connection arguments. Normally at least properties “user” and “password” with their corresponding values. For example { ‘user’ : ‘SYSTEM’, ‘password’ : ‘mypassword’ }
- Returns
a DataFrame
New in version 1.4.