I have always tried to mirror my package creations in both R & Python whenever I set out to solidify overarching scripts for the rest of my team. When working with R and Python, one major difference that quickly becomes apparent is their approach to packages and file structures. Understanding these differences helps manage your projects more effectively and improves collaboration within teams.
R Packages and File Structure
In R, a package is essentially a collection of functions, data, documentation, and tests bundled together. Packages follow a clearly defined structure that typically looks like this: (Admittedly - best practice requires there should also be a man & tests folder as well - but I have never had the need for them.)
addotnet_functions_r/
├── DESCRIPTION
├── NAMESPACE
├── R/
│ ├── clickhouse_functions.R
│ └── ChatGPT_Interaction.R
│ └── addotnet_rev_numbers.R
├── data/
│ └── dataset.rda
└── vignettes/
└── intro-to-myRpackage.Rmd
- DESCRIPTION file: Specifies metadata like the package name, version, author, and dependencies.
- NAMESPACE file: Lists functions exposed to users.
- R directory: Contains the actual R scripts with functions.
- man directory: Holds documentation files generated using R's documentation system.
- tests directory: Includes unit tests, usually leveraging packages like testthat.
- data directory: Optional, contains example datasets.
- vignettes directory: Stores tutorials or detailed explanations in R Markdown.
R packages are installed either from CRAN, GitHub, or locally, providing reproducibility and ease of sharing within the R community.
Python Packages and File Structure
Python package structures are more flexible but commonly follow standardized conventions, often looking like this: (Borrowing the structure from my py_toolkit project)
py_toolkit/
├── README.md
├── google_oauth_token.json
├── google_oauth_access_token.json
├── google_service_account.json
├── pyproject.toml
├── requirements.txt
├── src
│ ├── __init__.py
│ └── utility_functions
│ ├── __init__.py
│ ├── genesys_utility
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ ├── common.py
│ │ ├── conversation.py
│ │ ├── conversation_details_query.py
│ │ ├── transformations.py
│ │ └── users.py
│ ├── gmail_utility
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ ├── common.py
│ │ ├── file_handling.py
│ │ ├── inbox.py
│ │ └── send.py
│ ├── google_sheets_utility
│ │ ├── __init__.py
│ │ ├── common.py
│ │ └── sheets.py
│ ├── salesforce_utility
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ ├── common.py
│ │ ├── query.py
│ │ ├── reporting.py
│ │ └── transformations.py
│ └── sharepoint_utility
│ ├── __init__.py
│ ├── auth.py
│ ├── common.py
│ ├── explorer.py
│ └── uploader.py
└── tests
├── test_genesys_utility.py
├── test_gmail_utility.py
├── test_google_sheets_utility.py
├── test_salesforce_utility.py
└── test_sharepoint_utility.py
- setup.py: Specifies metadata and dependencies for package installation. Poetry may well deprecate this approach.
- requirements.txt: Lists all dependencies explicitly for easy environment setup.
- init.py: Signals Python that the directory is a package and allows for importing modules easily.
- README.md and LICENSE: Commonly used for providing basic package information and licensing.
- tests: Typically uses pytest or unittest frameworks.
- docs: Frequently contains user documentation or instructions in markdown format.
Python packages are typically installed from PyPI or GitHub using tools like pip, enabling efficient dependency management and distribution.
Key Differences
Standardization: R has stricter and more clearly standardized structures required by CRAN. Python structures are more flexible and less enforced by PyPI.
Documentation: R uses the built-in documentation system with .Rd files, whereas Python commonly leverages markdown-based docs or tools like Sphinx.
Dependency Management: R specifies dependencies explicitly in the DESCRIPTION file, while Python commonly utilizes requirements.txt or setup.py.
Choosing the Right Structure
The choice between R and Python packaging structures depends on your project goals:
- Use R for clearly defined, statistical packages with built-in, structured documentation and standardized distribution.
- Use Python if your workflow benefits from flexibility, a broader range of applications beyond data analysis, and easily manageable environments.
In my experience - if you're collaborating with another team (usually some form of engineering), python workflows for package creation is by far the preferred method. It mirrors object oriented programming practices enough and is in a syntax that they will find more comforting. I, personally, find the doc string syntax more pleasing on the eye for when help needs to be called versus the roxygen2 syntax which I find rather bulky.
As I also come from a scripting coding background - I have largely found the tests required as best practice to be largely a waste of time and an annoyance - however both languages do allow this measure to be skipped.
As such, given that the main goal in package creation is for shared use I have largely stopped creating R packages unless for my personal.