lynguine.util

The util module provides various utility functions for working with data, files, and text.

DataFrame Utilities

lynguine.util.dataframe.convert_datetime_to_str(df)[source]

Convert datetime columns to strings in isoformat for ease of writing.

Parameters:

df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The DataFrame to convert.

Returns:

The converted DataFrame.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFram

lynguine.util.dataframe.reorder_dataframe(df, order)[source]

This function reorders the given data frame columns with the order given by the columns listed in order and any remaining columns placed alphabetically after order.

Parameters:

df (pd.DataFrame or lynguine.data.CustomDataFrame) – The DataFrame to reorder.

:

lynguine.util.dataframe.convert_datetime(df, columns)[source]

Preprocessor to set datetime type on columns.

lynguine.util.dataframe.convert_int(df, columns)[source]

Preprocessor to set integer type on columns.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be converted.

  • columns (list) – The columns to be converted.

Returns:

The converted dataframe.

Return type:

pandas.DataFrame

lynguine.util.dataframe.convert_string(df, columns)[source]

Preprocessor to set string type on columns.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be converted.

  • columns (list) – The columns to be converted.

Returns:

The converted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.convert_year_iso(df, column='year', month=1, day=1)[source]

Preprocessor to set string type on columns.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be converted.

  • columns (list) – The columns to be converted.

Returns:

The converted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.addmonth(df, source='date')[source]

Add month column based on source date field.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be augmented.

  • source (str) – The source column to be used.

Returns:

The augmented dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

Raises:
  • KeyError – If the source column is not in the dataframe

  • TypeError – If the source column is not of type datetime.date

  • ValueError – If the source column is not a valid date

lynguine.util.dataframe.addyear(df, source='date')[source]

Add year column and based on source date field.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be augmented.

  • source (str) – The source column to be used.

Returns:

The augmented dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.augmentmonth(df, destination='month', source='date')[source]

Augment the month column based on source date field.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be augmented.

  • destination (str) – The destination column to be used.

  • source (str) – The source column to be used.

Returns:

The augmented dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.augmentyear(df, destination='year', source='date')[source]

Augment the year column based on source date field.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be augmented.

  • destination (str) – The destination column to be used.

  • source (str) – The source column to be used.

Returns:

The augmented dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.augmentcurrency(df, source='amount', sf=0)[source]

Preprocessor to set integer type on columns.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be converted.

  • columns (list) – The columns to be converted.

Returns:

The converted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.fillna(df, column, value)[source]

Fill missing values in a column with a given value.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be converted.

  • column (str) – The column to be converted.

  • value (str) – The value to be used to fill missing values.

Returns:

The converted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.ascending(df, by)[source]

Sort dataframe in ascending order.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be sorted.

  • by (str) – The column to sort by.

Returns:

The sorted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.descending(df, by)[source]

Sort dataframe in descending order.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be sorted.

  • by (str) – The column to sort by.

Returns:

The sorted dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.recent(df, column='year', since_year=2000)[source]

Filter on whether item is recent

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • column (str) – The column to be filtered on.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.current(df, start='start', end='end', current=None, today=None)[source]

Filter on whether the row is current as given by start and end dates. If current is given then it is used instead of the range check. If today is given then it is used instead of the current date.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • start (str) – The start date of the entry.

  • end (str) – The end date of the entry.

  • current (str) – Column of true/false current entries.

  • today (datetime.date) – The date to be used as today.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.former(df, end='end')[source]

Filter on whether item is former.

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • end (str) – The end date of the entry.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.onbool(df, column='current', invert=False)[source]

Filter on whether column is positive (or negative if inverted)

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • column (str) – The column to be filtered on.

  • invert (bool) – Whether to invert the filter.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.columnis(df, column, value)[source]

Filter on whether a given column is equal to a given value

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • column (str) – The column to be filtered on.

  • value – The value to be used to filter.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

lynguine.util.dataframe.columncontains(df, column, value)[source]

Filter on whether column contains a given value

Parameters:
  • df (pandas.DataFrame or lynguine.data.CustomDataFrame) – The dataframe to be filtered.

  • column (str) – The column to be filtered on.

  • value – The value to be used to filter.

Returns:

The filtered dataframe.

Return type:

pandas.DataFrame or lynguine.data.CustomDataFrame

File Utilities

lynguine.util.files.get_cvs_version(filename, full_path)[source]

Get the CVS version of a file.

Parameters:
  • filename (str) – The name of the file to get the version of.

  • full_path (str) – The full path to the file.

Returns:

The CVS version of the file.

Return type:

str

lynguine.util.files.get_svn_version(filename, full_path)[source]

Get the SVN version of a file.

Parameters:
  • filename (str) – The name of the file to get the version of.

  • full_path (str) – The full path to the file.

Returns:

The SVN version of the file.

Return type:

str

lynguine.util.files.get_git_version(filename, full_path, git_path)[source]

Get the latest Git version (commit hash) of a file.

Parameters:
  • filename (str) – The name of the file to get the version of.

  • full_path (str) – The full path to the file.

  • git_path (str) – The path to the git repository.

Returns:

The latest Git commit hash of the file.

Return type:

str

lynguine.util.files.read_txt_file(filename, dir_name='.', comment_char='#')[source]

Read in a text file ignoring lines that start with a comment character.

Parameters:
  • filename (str) – The name of the file to read in.

  • dir_name (str) – The directory to read the file from.

  • comment_char (str) – The character to use for comments.

Returns:

The contents of the file.

Return type:

str

lynguine.util.files.extract_file_details(filename, dir_name='.', comment_char='#', seperator=',')[source]

Read csv file ignoring empty lines and those that start with a comment character.

Parameters:
  • filename (str) – The name of the file to read in.

  • dir_name (str) – The directory to read the file from.

  • comment_char (str) – The character to use for comments.

  • seperator (str) – The character to use for seperating fields.

Returns:

The details of the file.

Return type:

list of lists

YAML Utilities

exception lynguine.util.yaml.FileFormatError(ind, msg=None, field=None)[source]

Bases: Exception

Exception raised for errors in the file format.

add_note()

Exception.add_note(note) – add a note to the exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

lynguine.util.yaml.update_from_file(dictionary, filename)[source]

Update a given dictionary with the fields from a specified file.

Parameters:
  • dictionary (dict) – The dictionary to be updated.

  • filename (str) – The name of the file to be read in.

Returns:

The updated dictionary.

lynguine.util.yaml.header_field(field, fields, user_file=['_config.yml'])[source]

Return one field from yaml header fields.

Parameters:
  • field (str) – The field to be returned.

  • fields (dict) – The fields to be searched.

  • user_file (str) – The user file to be searched.

lynguine.util.yaml.header_fields(filename)[source]

Extract headers from a talk file.

Parameters:

filename (str) – The name of the file to be read in.

Returns:

The headers.

Return type:

dict

lynguine.util.yaml.extract_header_body(filename)[source]

Extract the text of the headers and body from a yaml headed file.

Parameters:

filename (str) – The name of the file to be read in.

Returns:

The headers and body.

Return type:

tuple

Liquid Template Utilities

lynguine.util.liquid.load_template_env(ext='.md', template_dir=None)[source]

Load in the templates to be used for lists.

Parameters:
  • ext (str) – The extension of the templates to be loaded, default is Markdown.

  • template_dir (str or None) – Optional custom directory for templates.

Returns:

The template environment.

Return type:

liquid.Environment

lynguine.util.liquid.url_escape(string)[source]

Filter to escape urls for liquid

Parameters:

string (str) – The string to be escaped.

Returns:

The escaped string.

Return type:

str

lynguine.util.liquid.markdownify(string)[source]

Filter to convert markdown to html for liquid

Parameters:

string (str) – The string to be converted to html.

Returns:

The html.

Return type:

str

lynguine.util.liquid.relative_url(string)[source]

Filter to convert to a relative_url a jupyter notebook under liquid

Parameters:

string (str) – The string to be converted to a relative url.

Returns:

The relative url.

Return type:

str

lynguine.util.liquid.absolute_url(string)[source]

Filter to convert to an absolute_url a jupyter notebook under liquid

Parameters:

string (str) – The string to be converted to an absolute url.

Returns:

The absolute url.

Return type:

str

lynguine.util.liquid.to_i(string)[source]

Filter to convert the liquid entry to an integer under liquid.

Parameters:

string (str) – The string to be converted to an integer.

Returns:

The integer value.

Return type:

int

Miscellaneous Utilities

lynguine.util.misc.iskeyword()

x.__contains__(y) <==> y in x.

lynguine.util.misc.log = <lynguine.log.Logger object>

Utility functions for helping, e.g. to create the relevant yaml files quickly.

lynguine.util.misc.reorder_dictionary(dictionary, order, sort_remaining=True)[source]

Reorder a dictionary according to a given order.

Parameters:
  • dictionary (dict) – The dictionary to be reordered.

  • order (list) – The order to be used for reordering.

  • sort_remaining (bool)

Returns:

The reordered dictionary.

Return type:

dict

lynguine.util.misc.extract_full_filename(details)[source]

Return the filename from the details of directory and filename

Parameters:

details (dict) – The details of the file to be extracted.

Returns:

The filename.

Return type:

str

lynguine.util.misc.extract_root_directory(directory, environs=['HOME', 'USERPROFILE', 'TEMP', 'TMPDIR', 'TMP'])[source]

Extract a root directory and a subdirectory from a given directory string.

Parameters:
  • directory (str) – The directory to extract from.

  • environs (list[str])

Returns:

The root directory and subdirectory.

Return type:

tuple

lynguine.util.misc.extract_file_type(filename)[source]

Return a standardised file type.

Parameters:

filename (str) – The filename to be extracted.

Returns:

The standardised file type.

Return type:

str

lynguine.util.misc.extract_abs_filename(details)[source]

Return the absolute filename by adding current directory if it’s not present

Parameters:

details (dict) – The details of the file to be extracted.

Returns:

The absolute filename.

Return type:

str

lynguine.util.misc.camel_capitalize(word)[source]

Capitalize the word in camel case.

Parameters:

word (str) – The word to be capitalized.

Returns:

The capitalized word.

Return type:

str

lynguine.util.misc.remove_nan(dictionary)[source]

Delete missing entries from dictionary

Parameters:

dictionary (dict) – The dictionary to be cleaned.

Returns:

The dictionary with missing entries removed.

Return type:

dict

lynguine.util.misc.isna(entry)[source]

Check if an entry is missing.

Parameters:

entry – The entry to be checked.

Returns:

True if the entry is missing, False otherwise.

Return type:

bool

lynguine.util.misc.is_valid_var(variable)[source]

Test if a variable name is valid.

Parameters:

variable (str) – The variable name to be tested.

Returns:

True if the variable name is valid, False otherwise.

Return type:

bool

lynguine.util.misc.to_valid_var(variable)[source]

Convert a given input (scalar or string) to a valid Python variable name. Replaces invalid characters with underscores and ensures the name is not a Python keyword.

Parameters:

variable (int, float, or str) – The input to be converted to a valid variable name.

Returns:

The input converted to a valid variable name.

Return type:

str

lynguine.util.misc.to_camel_case(text)[source]

Remove non alpha-numeric characters and convert to camel case.

Parameters:

text (str) – The text to be converted.

Returns:

The text converted to camel case.

Return type:

str

Raises:

ValueError – If the text is empty.

lynguine.util.misc.sub_path_environment(path, environs=['HOME', 'USERPROFILE', 'TEMP', 'TMPDIR', 'TMP', 'BASE'])[source]

Replace a path with values from environment variables.

Parameters:
  • path (str) – The path to be replaced.

  • environs (list) – The environment variables to be replaced.

Returns:

The path with environment variables replaced.

Return type:

str

lynguine.util.misc.get_path_env(environs=['HOME', 'USERPROFILE', 'TEMP', 'TMPDIR', 'TMP', 'BASE'])[source]

Return the current path with environment variables.

Returns:

The current path with environment variables replacing.

Return type:

str

Parameters:

environs (list[str])

lynguine.util.misc.get_url_file(url, directory=None, filename=None, ext=None)[source]

Download a file from a url and save it to disk.

Parameters:
  • url (str) – The url of the file to be downloaded.

  • directory (str) – The directory to save the file to

  • filename (str) – The filename to save the file to

  • ext (str) – The extension to save the file to

Returns:

The filename of the downloaded file

Return type:

str

lynguine.util.misc.prompt_stdin(prompt)[source]

Ask user for agreeing to data set licenses.

Parameters:

prompt (str) – The prompt message to display to the user.

Returns:

True if the user agrees, False otherwise.

Return type:

bool

lynguine.util.misc.markdown2html(text)[source]

Convert markdown to HTML.

Parameters:

text (str) – The markdown text to be converted.

Returns:

The HTML.

Return type:

str

lynguine.util.misc.html2markdown(text, **args)[source]

Convert HTML to markdown.

Parameters:

text (str) – The HTML text to be converted.

Returns:

The markdown.

Return type:

str

Text Utilities

lynguine.util.text.render_liquid(compute, template, **kwargs)[source]

Wrapper to liquid renderer.

Parameters:

template (str) – The template to be rendered.

Returns:

The rendered template.

Return type:

str

TeX Utilities

lynguine.util.tex.extract_bib_files(text)[source]

Extract all the bib files listed in the file lines.

Parameters:

text (str) – The text of the file to be parsed.

Returns:

The list of bib files.

Return type:

list

lynguine.util.tex.substitute_inputs(filename, directories=None)[source]

Take the base file and substitute in any input and include files.

Parameters:
  • filename (str) – The filename to be substituted.

  • directories (list) – The directories to search for the input files.

Returns:

The substituted file.

Return type:

str

lynguine.util.tex.input_file_name(filename, extension='.tex')[source]

Return the filename with the extension if it exists.

Parameters:
  • filename (str) – The filename to be checked.

  • extension (str) – The extension to be checked.

Returns:

The filename with the extension if it exists.

Return type:

str

lynguine.util.tex.process_file(filename, extension='.tex')[source]

Process a file and return the lines.

Parameters:
  • filename (str) – The filename to be processed.

  • extension (str) – The extension to be processed.

Returns:

The lines of the processed file.

Return type:

list

lynguine.util.tex.extract_inputs(text)[source]

Extract latex file dependencies.

Parameters:

text (str or list of str (for backwards compatability)) – The text of the file to be processed.

Returns:

The list of files.

Return type:

list

lynguine.util.tex.extract_diagrams(lines, type='all')[source]

Extract all the diagrams listed in the file.

Parameters:
  • lines (list) – The lines of the file to be processed.

  • type (str) – The type of diagrams to be extracted.

Returns:

The list of diagrams.

Return type:

list

lynguine.util.tex.extract_citations(lines)[source]

Extract all the citations listed in the file lines.

Parameters:

lines (list) – The lines of the file to be processed.

Returns:

The list of citations.

Return type:

list

lynguine.util.tex.make_bib_file(citations_list, bib_files)[source]

Create a new bibliography file for a given list of citations.

Parameters:
  • citations_list (list) – The list of citations.

  • bib_files (list) – The list of bib files.

Returns:

The new bibliography file.

Return type:

str

lynguine.util.tex.get_bib_strings(string_list, bib_files)[source]

Create a new bibliography file for a given list of bibtex strings.

Parameters:
  • string_list (list) – The list of bibtex strings.

  • bib_files (list) – The list of bib files.

Returns:

The new bibliography file.

Return type:

str

lynguine.util.tex.get_bib_cross_refs(string_list, bib_files)[source]

Create a new bibliography file for a given list of cross references.

Parameters:
  • string_list (list) – The list of cross references.

  • bib_files (list) – The list of bib files.

Returns:

The new bibliography file.

Return type:

str

lynguine.util.tex.create_bib_file_given_tex(lines)[source]

Create a new bibliography file for a given latex file.

Parameters:

lines (list) – The lines of the file to be processed.

Returns:

The new bibliography file.

Return type:

str

Talk Utilities

lynguine.util.talk.talk_field(field, filename, user_file=['_config.yml'])[source]

Return one field from a talk.

Parameters:
  • field (str) – The field to return.

  • filename (str) – The filename of the talk.

lynguine.util.talk.extract_bibinputs(filename)[source]

Extract bibinput files from a talk

Parameters:

filename (str) – The filename of the talk.

Returns:

The bibinput files.

Return type:

list

lynguine.util.talk.extract_all(filename, user_file=['_config.yml'])[source]

List the different files the talk file creates.

Parameters:
  • filename (str) – The filename of the talk.

  • user_file (list) – The user file to use.

Returns:

The list of files.

Return type:

list

lynguine.util.talk.extract_inputs(filename, snippets_path='..')[source]

Extract input and include files from a talk

Parameters:
  • filename (str) – The filename of the talk.

  • snippets_path (str) – The snippets path.

Returns:

The list of files.

Return type:

list

lynguine.util.talk.extract_diagrams(filename, absolute_path=True, diagram_exts=['svg', 'png', 'emf', 'pdf'], diagrams_dir=None, snippets_path=None)[source]

Extract diagrams from a talk

Parameters:
  • filename (str) – The filename of the talk.

  • absolute_path (bool) – Whether to use absolute paths.

  • diagram_exts (list) – The diagram extensions.

  • diagrams_dir (str) – The diagrams directory.

  • snippets_path (str) – The snippets path.

Returns:

The list of diagrams.

Return type:

list

Fake Data Generation

lynguine.util.fake.prefix(name)[source]

Checks if name contains a prefix. If so, returns the prefix and the name without the prefix. Otherwise returns None and the name.

Returns:

A tuple containing the prefix and the name without the prefix.

Return type:

tuple

lynguine.util.fake.suffix(name)[source]

Returns a random suffix.

Returns:

A random suffix.

Return type:

str

lynguine.util.fake.author_editor()[source]

Returns a random author or editor name.

Returns:

A random author or editor..

Return type:

str

lynguine.util.fake.row()[source]
lynguine.util.fake.entry_update(entry, **kwargs)[source]

Update an entry with additional fields.

Parameters:
  • entry (dict) – The entry to be updated.

  • kwargs (dict) – The additional fields to be added.

Returns:

The updated entry.

Return type:

dict

lynguine.util.fake.random_entry_type()[source]

Returns a random entry type.

Returns:

A random entry type.

Return type:

str

lynguine.util.fake.bibliography_entry()[source]

Returns a random bibliograhy entry.

Returns:

A random bibliograhy entry.

Return type:

dict

lynguine.util.fake.rows(num_rows, row_type=<function row>)[source]

Returns a list of random rows.

Parameters:
  • num_rows (int) – The number of rows to be returned.

  • row_type (function) – The type of row to be returned.

Returns:

A list of random rows.

Return type:

list

lynguine.util.fake.to_bibtex_author(entry, translate_unicode=True, author_type='author')[source]

Convert a citeproc author/editor bibliography entry to bibtex format.

Citeproc separates authors into family, given, prefix. Bibtext combines them into a single author field. This function converts the citeproc format to the bibtex format using liquid syntax.

Parameters:

entry (dict) – The entry to be converted.

Returns:

The converted entry.

Return type:

str

lynguine.util.fake.to_bibtex(entry, translate_unicode=True)[source]

Convert a citeproc bibliography entry to bibtex format.

Parameters:

entry (dict) – The entry to be converted.

Returns:

The converted entry as a dictionary.

Return type:

dict

lynguine.util.fake.row_allocation_additional_scores_series(num_ruows)[source]
lynguine.util.fake.DataFrame(num_rows)[source]
class lynguine.util.fake.Generate[source]

Bases: object

person = <mimesis.providers.person.Person object>
classmethod givenName()[source]
classmethod familyName()[source]
classmethod prefix()[source]
classmethod suffix()[source]
classmethod name()[source]
classmethod city()[source]
classmethod state()[source]
classmethod address()[source]
classmethod email()[source]
classmethod date()[source]

HTML Utilities

lynguine.util.html.get_reference(key_name)[source]

Gets a reference from the web.

File no longer implemented as the web page no longer exists.

Parameters:

key_name (string) – the key name of the reference

Returns:

the reference

Return type:

string

lynguine.util.html.write_to_file(file, string, style='', title='', header='', footer='', navigation='')[source]
lynguine.util.html.md_write_to_file(file, string, style='', title='', header='', footer='', navigation='')[source]