Utils

get_length ¶

get_length(iterable)

Get the length of an Iterable object.

This method attempts to use the len() function. If len() is not available, this method attempts to count the number of items in iterable by iterating over the iterable. This iteration over iterable can be a problem for single-use Iterators.

Parameters:

Name	Type	Description	Default
`iterable`	`Iterable \| Sized`	The iterable object for which the length is to be determined	required

Returns:

Type	Description
`int`	The length of the iterable object

Raises:

Type	Description
`TypeError`	If the length of the iterable object cannot be determined.

Examples:

>>> get_length([1, 2, 3, 4, 5])
5

>>> get_length('Hello, World!')
13

>>> get_length({'a': 1, 'b': 2, 'c': 3})
3

>>> get_length(range(4))
4

get_names_by_dtype ¶

get_names_by_dtype(df, dtype)

Get the list of column names that match the specified data type.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The input `pyspark.sql.DataFrame`	required
`dtype`	`str`	The data type to filter the column names. Example: `'bigint'`.	required

Returns:

Type	Description
`list[str]`	A list of column names that match the specified data type

Examples:

>>> spark = SparkSession.builder.getOrCreate()
>>> rows = [(1, 3.4, 1), (3, 4.5, 2)]
>>> df = spark.createDataFrame(rows, schema=['col1', 'col2', 'col3'])
>>> get_names_by_dtype(df, 'bigint')
['col1', 'col3']

is_nan_scalar ¶

is_nan_scalar(x)

Check if the given value is a scalar NaN (Not a Number).

Parameters:

Name	Type	Description	Default
`x`	`Any`	The value to be checked.	required

Returns:

Type	Description
`bool`	True if the value is a scalar NaN, False otherwise.

Examples:

>>> is_nan_scalar(5)
False

>>> is_nan_scalar(float('nan'))
True

>>> is_nan_scalar('hello')
False