channelpack API Reference

All objects and functions documented below are available by:

import channelpack

in the channelpack namespace.

ChannelPack object

class channelpack.ChannelPack(data=None, names=None)

Callable collection of data.

Hold a dict of data (numpy 1d arrays) and make possible to refer to them by calls of this object, (pack(ch)). A boolean mask is kept with the pack, used to optionally filter out sections of data in calls.

data

The dict is not supposed to be consulted directly, call the ChannelPack object to refer to arrays. Keys are integers representing column numbers. Setting this attribute to a new dict of data will convert values to numpy arrays and call mask_reset() automatically.

Type:dict
mask

A boolean array of the same size as the data arrays. Initially all True.

Type:numpy.ndarray
nof

‘nan’, ‘filter’ or None. In calls to the object, this attribute is consulted to determine how to return data arrays. If None, arrays are returned as is (the default). If ‘nan’, elements in the returned array with corresponding False element in mask are replaced with numpy.nan or None, equivalent to np.where(array, mask, np.full(len(array), np.nan)). ‘filter’ yeilds the equivalent to array[mask] – the array is stripped down to elements with corresponding True elements in mask. The effect of this attribute can be overridden in calls of the object.

Type:str or None
names

Keys are integers representing column numbers (like in data), values are strings, the field names. Keys in names aligned with keys in data makes it possible to refer to arrays by field names. This alignment is not enforced.

Type:dict
mindur

Like the method duration (which see) but with a persistent effect. Any time the mask is updated, this attribute is consulted to falsify any true part in the mask that is not long enough. The value refer to the required number of elements in a true section.

Setting this attribute to a value (not None) updates the mask without first resetting it.

Type:int or None
FALLBACK_PREFIX

Defaults to ‘ch’. This can be used in calls of the pack in place of a “proper” name. If 4 is a key in the data dict, pack(‘ch4’) can be used to get at that data. This is also used as requested in calls to the records method. Everything after this prefix is assumed to be a number. The prefix should be a valid python variable name.

Type:str
fn

File name of a possible source data file. After initialization it is up to the caller to set this attribute, else it is the empty string.

Type:str
filenames

Maintained by the pack when setting fn. Extended with other.filenames in calls to append_pack(other). A list of one or more empty strings if fn is not set.

Type:list of str
__init__(data=None, names=None)

Initiate a ChannelPack

Convert given sequences in data to numpy arrays if necessary.

Parameters:
  • data (dict) – Keys are integers representing column numbers, values are sequences representing column data.
  • names (dict) – Keys are integers representing column numbers (like in data), values are strings, the field names.
__call__(ch, part=None, nof=None)

Return data from “channel” ch.

If part is not given, return the array for ch respecting the setting of attribute nof. See the class attributes description in ChannelPack for the meaning of nof.

Parameters:
  • ch (str or int) – The channel key, name or fallback string. The lookup order is keys in the data dict, names in the names dict and finally if ch matches a fallback string.
  • part (int) – The 0-based enumeration of a True part to return. Overrides the effect of attribute or argument nof.
  • nof (str) – One of ‘nan’, ‘filter’ or ‘ignore’. Providing this argument overrides any setting of the corresponding attribute nof, and have the same effect on the returned data as the attribute nof. The value ‘ignore’ can be used to get the full array despite a setting of the attribute nof.
append_pack(other)

Append data from other into this pack.

If this pack has data (attribute data is non-empty), it has to have the same set of keys as other.data (if that is non-empty). Same is true for the attribute names.

Array dtypes in respective pack.data are at the mercy of numpy append function.

Extend filenames with other.filenames.

mask_reset is called after the append.

Parameters:other (ChannelPack instance) – The other pack.
Raises:ValueError – If non-empty dicts in packs do not align.
mask_reset()

Set the mask attribute to the length of data and all True.

If this pack’s data dict is empty, set mask to an empty array. Size of the mask is based on the array with the lowest key in data.

duration(duration, samplerate=1, mindur=True)

Require each true part to be at least duration long.

Make false any true part in the mask attribute that is not at least duration long.

Parameters:
  • duration (int or float) –
  • samplerate (int or float) – If samplerate is 10 and duration is 1, a True part of minimum 10 elements is required.
  • mindur (bool) – If False, require parts to be at most duration long instead.
Returns:

The possibly altered mask.

Return type:

array

startstop(startb, stopb, apply=True)

Start and stop trigger masking.

Elements in startb and stopb are start and stop triggers for masking. A true stop dominates a true start.

Parameters:
  • startb (sequence) –
  • stopb (sequence) – Elements are tested with if el…
  • apply (bool) – If True, apply the result of this method to the mask attribute by anding it, (mask &= result).
Returns:

A bool ndarray, the result of this method.

Return type:

array

Example

One descend:

height: 1 2 3 4 5 4 3 2 1
startb: F F F F T F F F F (height == 5)
stobb:  T F F F F F F F T (height == 1)
result: F F F F T T T T F
-> height:      5 4 3 2
parts()

Return the enumeration of the True parts.

The list is always consecutive or empty. Each index in the returned list can be used to refer to a True part in the mask attribute.

records(part=None, nof=None, fallback=False)

Return a generator producing records of the pack.

Each record is provided as a collections.namedtuple with the packs names as field names. This is useful if each record make a meaningful data set on its own.

Parameters:
  • part (int) – The 0-based enumeration of a True part to return. Overrides the effect of attribute or argument nof.
  • nof (str) – One of ‘nan’, ‘filter’ or ‘ignore’. Providing this argument overrides any setting of the corresponding attribute nof, and have the same effect on the returned data as the attribute nof. The value ‘ignore’ can be used to get all the records despite a setting of the attribute nof.
  • fallback (bool) – The named tuple requires python-valid naming. If fallback is False, ValueError is raised if any of the names in names is an invalid identifier. fallback=True will use FALLBACK_PREFIX to produce names.
Raises:

ValueError – In iteration of the generator if any of the names used for the namedtuple is invalid python identifiers.

Note

Either there must be names defined in the pack or argument fallback must be True, else there will be no records.

name(ch, firstwordonly=False, fallback=False)

Return a name string for channel ch in names.

A helper method to get a name string, possibly modified according to arguments. Succeeds only if ch corresponds to a key in data.

Parameters:
  • ch (int or str.) – The channel key or name. An integer key has precedence.
  • firstwordonly (bool or str) – If True, return only the first space-stripped word in the name. If a string, use as a regex pattern with re.findall on the name string and return the first element found.
  • fallback (bool) – If True, return the fallback string <FALLBACK_PREFIX><N>, where N corresponds to the data key. Ignore the firstwordonly argument.

Functions to get a pack from data files

Text

Data stored in readable text files in the form of delimited data fields, (csv, txt). Fields might be numbers or text:

channelpack.textpack(fname, names=None, delimiter=None, skiprows=0, usecols=None, hasnames=False, encoding=None, converters=None, stripstrings=False, debug=False)

Make a ChannelPack from delimited text data.

First line of data is the line following skiprows.

First line of data determines what fields (splitted by delimiter) can be converted to a float. Fields that can’t be converted to float will be treated as strings. Converters in converters are used if given.

Numeric fields with decimal comma are understood as numeric (besides numerics with decimal point). If delimiter is a comma it is therefore important to specify that.

Parameters:
  • fname (str, file or io stream) –
  • names (dict) – Keys are integers (0-based column numbers) and values are field names. If provided it will be set in the pack and is mutually exclusive with the usecols argument.
  • delimiter (str or bytes) – If not given, any white space is assumed. If fname is a stream of bytes, delimiter must be bytes if not None.
  • skiprows (int) – The number of lines to ignore in the top of fname. First line following skiprows is data.
  • usecols (sequence or int) – The columns to read. A single integer means read that one column. Ignore if names is given.
  • hasnames (bool) – If True, the last line of skiprows is assumed to be field names and will be used to set names in the pack. Ignored if names is given.
  • encoding (str) – Use encoding to open fname. If None, use default encoding with io.open. Valid when fname is as string. If fname is a stream of bytes and encoding is given, use encoding to decode bytes in text fields.
  • converters (dict) – A mapping of column numbers and functions. Each function take one string argument and return a value.
  • stripstrings (bool) – For string fields, strip off leading and trailing whitespace resulting from whitespace around the delimiter.
  • debug (bool) – If true, output the functions used on fields and the last successful line number read, before an exception is raised.

If data is numeric only, a lazy variant is available:

channelpack.lazy_textpack(fname, parselines=25, **textkwargs)

Return a ChannelPack instance using textpack function.

Try to automatically derive values for the textpack keyword arguments ‘delimiter’, ‘skiprows’ and ‘converters’. Also try to parse out the field names.

Works with numerical data files, which might have a header with extra information to ignore. Converters derived is either float or one that converts numbers with decimal comma to a float.

Keyword arguments provided to this function overrides any derived equivalents.

Parameters:
  • fname (file, str) – Encoding given in textkwargs is respected.
  • parselines (int) – The number of lines to preparse. For a successful preparse it must include at least one line of numeric data.
  • **textkwargs – Other keyword arguments accepted by textpack. Overrides derived keyword arguments if duplicated.

Spread sheet

Code from the library xlrd is used, xls and xlsx types of spread sheets are supported:

channelpack.sheetpack(fname, sheet=0, header=True, startcell=None, stopcell=None, usecols=None)

Return a ChannelPack instance loaded from spread sheet file.

Parameters:
  • fname (str) – The file name to read from.
  • sheet (int or str) – Sheet enumeration or name string.
  • header (bool or str) – True means the data range include field names (top record). False means the whole range is data. A string can be used to specify the startcell of the header row, like “C1”.
  • startcell (str) – Spread sheet style notation of the upper left cell of the data range, like “C3”.
  • stopcell (str) – Spread sheet style notation of the lower right cell of the data range, like “H10”.
  • usecols (str or seqence of ints) – The columns to use, 0-based. 0 is the spread sheet column “A”. Can be given as a string also - ‘C:E, H’ for columns C, D, E and H.

About code from the xlrd project

channelpack include code from the xlrd project copied from a checkout of commit d470bc9374ee3a1cf149c2bab0684e63c1dcc575 and is thereby not dependent on the xlrd project.

With the release of version 2.0.0 of xlrd, support for the xlsx format was removed. A main reason it seems was nobody was willing to maintain it (the xlrd project do not discourage using xlrd for xls files). Concerns about possible vulnerabilities with the xml parsing was also raised and since channelpack now include the code that was removed from xlrd, some sort of re-iteration of those concerns is given here so a potential user of channelpack can make an informed choice.

The announcement about xlrd 2.x series and the deprecation of xlsx support can be read here

https://groups.google.com/g/python-excel/c/IRa8IWq_4zk/m/Af8-hrRnAgAJ

One issue alleged was that defusedxml and xlrd as a combination don’t work well with python 3.9. The linked defusedxml project readme discuss the vulnerabilities with xml files it addresses. Those vulnerabilities are also discussed in the Python docs here and in a thread on the python bug tracker, “XML vulnerabilities in Python”, discussing if it should be addressed by Python xml libraries.

In short, it is possible to craft xml files so they might cause harm or disturbance when parsing them with a parser not taking precautions for the risk. The code from the xlrd project included in channelpack uses defusedxml if available.

Early xlrd includes software developed by David Giffin <david@giffin.org>.

Xbase DBF format

Legacy kind of data base format:

channelpack.dbfpack(dbf, names=None)

Make a ChannelPack from dbf data file.

Parameters:
  • dbf (str or file) – If a file it should be opened for binary reads.
  • names (list of str) – A sequence of names to read. If not provided read all.