Utils
get_length ¶
get_length(iterable)
Get the length of an Iterable
object.
This method attempts to use the len()
function. If len()
is not available, this method attempts to count
the number of items in iterable
by iterating over the iterable. This iteration over iterable
can be a
problem for single-use Iterator
s.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable
|
Iterable | Sized
|
The iterable object for which the length is to be determined |
required |
Returns:
Type | Description |
---|---|
int
|
The length of the iterable object |
Raises:
Type | Description |
---|---|
TypeError
|
If the length of the iterable object cannot be determined. |
Examples:
>>> get_length([1, 2, 3, 4, 5])
5
>>> get_length('Hello, World!')
13
>>> get_length({'a': 1, 'b': 2, 'c': 3})
3
>>> get_length(range(4))
4
get_names_by_dtype ¶
get_names_by_dtype(df, dtype)
Get the list of column names that match the specified data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The input |
required |
dtype
|
str
|
The data type to filter the column names. Example: |
required |
Returns:
Type | Description |
---|---|
list[str]
|
A list of column names that match the specified data type |
Examples:
>>> spark = SparkSession.builder.getOrCreate()
>>> rows = [(1, 3.4, 1), (3, 4.5, 2)]
>>> df = spark.createDataFrame(rows, schema=['col1', 'col2', 'col3'])
>>> get_names_by_dtype(df, 'bigint')
['col1', 'col3']
is_nan_scalar ¶
is_nan_scalar(x)
Check if the given value is a scalar NaN (Not a Number).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Any
|
The value to be checked. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the value is a scalar NaN, False otherwise. |
Examples:
>>> is_nan_scalar(5)
False
>>> is_nan_scalar(float('nan'))
True
>>> is_nan_scalar('hello')
False