Column Management in CustomDataFrame
=====================================

This guide covers how to work with columns in ``CustomDataFrame``, including adding, removing,
and managing column types.

Understanding Column Specifications (colspecs)
----------------------------------------------

CustomDataFrame organizes columns into different types based on their purpose:

**Input Types** (Immutable)
    Read-only data that shouldn't be modified after loading.
    Types: ``input``, ``data``, ``constants``, ``global_consts``

**Output Types** (Mutable, Written to Files)
    Data that will be written to output files.
    Types: ``output``, ``writedata``, ``writeseries``, ``parameters``, ``globals``

**Cache Types** (Mutable, Temporary)
    Intermediate calculations that don't need to be saved.
    Types: ``cache``, ``series_cache``, ``parameter_cache``, ``global_cache``

**Parameter Types** (Global Values)
    Values that are globally valid without an index.
    Types: ``parameters``, ``globals``, ``parameter_cache``, ``global_cache``

**Series Types** (Multiple Rows Per Index)
    Data that may have multiple rows with the same index value.
    Types: ``series``, ``writeseries``, ``series_cache``

Adding Columns
--------------

Method 1: Direct Assignment (Recommended for Quick Operations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The simplest way to add a column is through direct assignment:

.. code-block:: python

    import pandas as pd
    from lynguine.assess.data import CustomDataFrame
    
    # Create a dataframe
    df = CustomDataFrame(pd.DataFrame({'A': [1, 2, 3]}))
    
    # Add a column using direct assignment (goes to 'cache' by default)
    df['B'] = [4, 5, 6]
    
    # You can also use pandas Series
    df['C'] = pd.Series([7, 8, 9], index=df.index)

Direct assignment automatically adds the column to the ``cache`` colspec type,
making it mutable but not saved to output files.

Method 2: add_column() Method (Recommended for Explicit Type Control)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Use ``add_column()`` when you need explicit control over the column type:

.. code-block:: python

    # Add a column with default type (cache)
    df.add_column('new_col', [10, 11, 12])
    
    # Add a column as output type (will be written to files)
    df.add_column('result', [13, 14, 15], colspec='output')
    
    # Add a parameter (global value)
    df.add_column('threshold', [0.5, 0.5, 0.5], colspec='parameters')

**Advantages of add_column():**

* Explicit type specification
* Validation that column doesn't already exist
* Validation that colspec type is valid
* Self-documenting code

Removing Columns
----------------

Use the ``drop_column()`` method to remove columns:

.. code-block:: python

    # Drop a single column
    df.drop_column('unwanted_col')
    
    # The column is removed from both the data and column specifications

**Note:** This permanently removes the column. There is no undo.

Common Patterns
---------------

Pattern 1: Pre-creating Columns for Compute Operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some compute operations expect columns to exist:

.. code-block:: python

    # Pre-create columns that compute operations will populate
    df['word_count'] = None
    df['sentiment_score'] = None
    
    # Then run compute operations that fill these columns
    df.compute.run()

Pattern 2: Converting Column Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To move a column to a different colspec type:

.. code-block:: python

    # Save the data
    col_data = df['temp_col']
    
    # Drop from current location
    df.drop_column('temp_col')
    
    # Re-add with new type
    df.add_column('temp_col', col_data, colspec='output')

Pattern 3: Working with Series Data (Multiple Rows Per Index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For data with multiple rows per index value:

.. code-block:: python

    # Create a series with repeated indices
    series_data = pd.Series([1, 2, 3, 4], index=['a', 'a', 'b', 'b'])
    
    # This automatically goes to a series-type colspec
    df['multi_value_col'] = series_data

Best Practices
--------------

1. **Use add_column() for clarity**: When the column type matters, use ``add_column()``
   with an explicit ``colspec`` parameter.

2. **Use direct assignment for quick work**: For temporary calculations or when working
   interactively, direct assignment (``df['col'] = value``) is more concise.

3. **Choose the right colspec type**:
   
   * Use ``input`` for source data that shouldn't change
   * Use ``output`` for results you want to save
   * Use ``cache`` for intermediate calculations
   * Use ``parameters`` for global configuration values

4. **Validate before dropping**: Check if a column exists before dropping to avoid errors:

   .. code-block:: python

       if 'col_name' in df.columns:
           df.drop_column('col_name')

Common Errors and Solutions
----------------------------

Error: Column already exists
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

    # ❌ This raises ValueError
    df.add_column('existing_col', [1, 2, 3])

**Solution**: Use direct assignment to replace values, or drop first:

.. code-block:: python

    # ✅ Replace values
    df['existing_col'] = [1, 2, 3]
    
    # ✅ Or drop and re-add
    df.drop_column('existing_col')
    df.add_column('existing_col', [1, 2, 3], colspec='output')

Error: Column not found
^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

    # ❌ This raises KeyError
    df.drop_column('nonexistent_col')

**Solution**: Check if column exists first:

.. code-block:: python

    # ✅ Check first
    if 'nonexistent_col' in df.columns:
        df.drop_column('nonexistent_col')

Error: Invalid colspec
^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

    # ❌ This raises ValueError
    df.add_column('col', [1, 2, 3], colspec='invalid_type')

**Solution**: Use one of the valid colspec types listed at the top of this guide.

Related Documentation
---------------------

* :class:`lynguine.assess.data.CustomDataFrame` - Full API documentation
* :meth:`lynguine.assess.data.CustomDataFrame.add_column` - Add column method
* :meth:`lynguine.assess.data.CustomDataFrame.drop_column` - Drop column method
* :doc:`compute_framework` - Using columns with compute operations