I’m trying to figure out a way to include metadata for columns in SQL DDL commands. This should work for different SQL flavors like PostgreSQL and Snowflake. Here’s what I’m aiming for:
CREATE TABLE example (
column1 INT `metadata_info`,
column2 VARCHAR(50) `other_metadata`
);
I want to make sure every column has this metadata. The tricky part is I need to be able to remove these annotations later and create a separate metadata map. Then I should be able to get back the original SQL without the annotations.
I’ve thought about two approaches:
- Modify the SQL grammar using ANTLR and work with the parse tree.
- Use regex to handle this, since there’s no standard grammar for all SQL dialects.
Has anyone tackled something similar? What’s the best way to approach this problem? I’m open to other ideas too if you have any suggestions.
Considering the complexity of SQL dialects, I’d suggest using a structured comment approach. You could embed JSON or YAML within block comments for each column. For example:
CREATE TABLE example (
column1 INT /*
{
"metadata_info": "value",
"additional_data": "something"
}
*/,
column2 VARCHAR(50) /*
{
"other_metadata": "another_value"
}
*/
);
This method allows for rich metadata, is easily parseable, and doesn’t interfere with SQL syntax. You can extract this information using regex patterns targeting the comment blocks, then process the JSON within. To remove, simply strip out the comment blocks. This approach offers flexibility and maintainability across various SQL flavors.
have you considered using comments for metadata? Like this:
CREATE TABLE example (
column1 INT, – metadata_info
column2 VARCHAR(50) – other_metadata
);
This way, you can easily parse and remove comments later. it’s also supported across different SQL flavors. might be simpler than modifying grammar or using complex regex.
hmm, intresting ideas! have u thought about using custom SQL extensions? some databases let u define ur own syntax. like:
CREATE TABLE example (
column1 INT METADATA(‘info=value’),
column2 VARCHAR(50) METADATA(‘other=data’)
);
this cud work across dialects with some tweaking. what do u think? any drawbacks to this approach?