Skip to content

Utils

get_length

get_length(iterable)

Get the length of an Iterable object.

This method attempts to use the len() function. If len() is not available, this method attempts to count the number of items in iterable by iterating over the iterable. This iteration over iterable can be a problem for single-use Iterators.

Parameters:

Name Type Description Default
iterable Iterable | Sized

The iterable object for which the length is to be determined

required

Returns:

Type Description
int

The length of the iterable object

Raises:

Type Description
TypeError

If the length of the iterable object cannot be determined.

Examples:

>>> get_length([1, 2, 3, 4, 5])
5
>>> get_length('Hello, World!')
13
>>> get_length({'a': 1, 'b': 2, 'c': 3})
3
>>> get_length(range(4))
4

get_names_by_dtype

get_names_by_dtype(df, dtype)

Get the list of column names that match the specified data type.

Parameters:

Name Type Description Default
df DataFrame

The input pyspark.sql.DataFrame

required
dtype str

The data type to filter the column names. Example: 'bigint'.

required

Returns:

Type Description
list[str]

A list of column names that match the specified data type

Examples:

>>> spark = SparkSession.builder.getOrCreate()
>>> rows = [(1, 3.4, 1), (3, 4.5, 2)]
>>> df = spark.createDataFrame(rows, schema=['col1', 'col2', 'col3'])
>>> get_names_by_dtype(df, 'bigint')
['col1', 'col3']

is_nan_scalar

is_nan_scalar(x)

Check if the given value is a scalar NaN (Not a Number).

Parameters:

Name Type Description Default
x Any

The value to be checked.

required

Returns:

Type Description
bool

True if the value is a scalar NaN, False otherwise.

Examples:

>>> is_nan_scalar(5)
False
>>> is_nan_scalar(float('nan'))
True
>>> is_nan_scalar('hello')
False