Parse Debian apt sources.
This Artifact searches for all apt sources files and parses all
fields in both one–line *.list
files and *.sources
files
(deb822-style format). The results are presented both in a readable
table and a flattened version for parsing.
*.list
files contains lines of the form
deb http://us.archive.ubuntu.com/ubuntu/ bionic main restricted
deb-src [arch=amd64,i386 signed-by=/usr/share/keyrings/foo.gpg] https://foo.bar.baz/ubuntu/main jammy main restricted universe multiverse # Comment
deb indicates a source for binary packages, and deb-src instructs APT where to find source code for packages.
*.sources
files (deb822-style format) are in the form of key–value
lines, and as opposed to the one–line format, they may contain
multiple URIs, components and types (deb/deb-src), along with
embedded GPG keys. Example:
Types: deb deb-src
URIs: file:/home/apt/debian http://foo.bar.baz/main
Suites: unstable
Components: main contrib non-free
The exported function parse_aptsources(OSPath, flatten)
parses
both formats and returns an (optionally flattened) table with
Any option is added to an individual column. The most common options are
All known option names are transformed to the plural PascalCase variants as listed in the sources.list man page. Any undocumented options will still be included in the results, with names unchanged. Options in the one-line format of the form “lang+=de”/“arch-=i386” will be put in columns like “Languages-Add”/“Architectures-Remove”, matching the option names having the same effect in deb822.
Entries in deb822 sources files may be disabled by including “Enabled: no” instead of commenting out all lines. If this field is not present with a falsly value, the entry is enabled. Use the exported functions DebTrue()/DebFalse() to correctly parse all accepted true/false strings, or use the VQL suggestion “Only enabled sources” to filter on this column (true), if present.
If the GPG key is embedded in a .sources file, the whole GPG key will be included in the cell. Otherwise the value will be a file path. Use the VQL suggestion “Hide embedded GPG keys” to replace embedded GPG keys with “(embedded)” in the results. In order to inspect the keys themselves (files or embedded data), use the exchange artifact Linux.Debian.GPGKeys.
If the function parameter “flatten” is False, multi–value fields (like Components) will be combined in a single space-separated string in each row.
In addition to the two apt sources tables, a third table correlates information from InRelease and Release files to provide additional metadata. The modification timestamps may tell when the package lists where last updated.
name: Linux.Debian.AptSources
description: |
Parse Debian apt sources.
This Artifact searches for all apt sources files and parses all
fields in both one–line `*.list` files and `*.sources` files
(deb822-style format). The results are presented both in a readable
table and a flattened version for parsing.
`*.list` files contains lines of the form
```
deb http://us.archive.ubuntu.com/ubuntu/ bionic main restricted
deb-src [arch=amd64,i386 signed-by=/usr/share/keyrings/foo.gpg] https://foo.bar.baz/ubuntu/main jammy main restricted universe multiverse # Comment
```
deb indicates a source for binary packages, and deb-src instructs APT where
to find source code for packages.
`*.sources` files (deb822-style format) are in the form of key–value
lines, and as opposed to the one–line format, they may contain
multiple URIs, components and types (deb/deb-src), along with
embedded GPG keys. Example:
```
Types: deb deb-src
URIs: file:/home/apt/debian http://foo.bar.baz/main
Suites: unstable
Components: main contrib non-free
```
The exported function `parse_aptsources(OSPath, flatten)` parses
both formats and returns an (optionally flattened) table with
- OSPath
- Types (deb/deb-src)
- Components (e.g. main/contrib/non-free/restricted,universe)
- Suites (e.g. unstable/bookworm/jammy)
- _URIBase (.e.g us.archive.ubuntu.com/ubuntu/)
- _Transport (e.g. http/https/file/cdrom/ftp)
- URIs (e.g. http://us.archive.ubuntu.com/ubuntu/)
Any option is added to an individual column. The most common options
are
- Architectures (e.g. amd64/i386/armel)
- Signed-By (e.g. /usr/share/keyrings/osquery.gpg)
All known option names are transformed to the plural PascalCase
variants as listed in the sources.list man page. Any undocumented
options will still be included in the results, with names unchanged.
Options in the one-line format of the form "lang+=de"/"arch-=i386"
will be put in columns like "Languages-Add"/"Architectures-Remove",
matching the option names having the same effect in deb822.
Entries in deb822 sources files may be disabled by including
"Enabled: no" instead of commenting out all lines. If this field
is not present with a falsly value, the entry is enabled. Use the
exported functions DebTrue()/DebFalse() to correctly parse all
accepted true/false strings, or use the VQL suggestion "Only enabled
sources" to filter on this column (true), if present.
If the GPG key is embedded in a .sources file, the whole GPG key
will be included in the cell. Otherwise the value will be a file
path. Use the VQL suggestion "Hide embedded GPG keys" to replace
embedded GPG keys with "(embedded)" in the results. In order to
inspect the keys themselves (files or embedded data), use the
exchange artifact Linux.Debian.GPGKeys.
If the function parameter "flatten" is False, multi–value fields
(like Components) will be combined in a single space-separated
string in each row.
In addition to the two apt sources tables, a third table correlates
information from InRelease and Release files to provide additional
metadata. The modification timestamps may tell when the package
lists where last updated.
reference:
- https://manpages.debian.org/bookworm/apt/sources.list.5.en.html
- https://manpages.debian.org/bookworm/dpkg-dev/deb822.5.en.html
- https://salsa.debian.org/apt-team/apt/-/blob/main/apt-pkg/sourcelist.cc
- https://wiki.debian.org/DebianRepository/Format#A.22Release.22_files
export: |
/* Remove whitespace from the beginning and end of a string: */
LET Trim(string) = regex_transform(source=string, map=dict(
`(?m)^\\s+`='',
`(?m)\\s+$`=''
))
/* Replace any repeating whitespace with a single space: */
LET Simplify(string) = regex_replace(source=string, re='''\s+''', replace=' ')
/* The syntax in lists (deb822) and sources (one-line) files varies a bit,
and deb822 is case-insensitive. Normalise all known fields (as per
the man page): */
LET NormaliseOpts(string) = regex_transform(source=string, map=dict(
`(?i)types|type`='Types',
`(?i)uris|uri`='URIs',
`(?i)suites|suite`='Suites',
`(?i)components|component`='Components',
`(?i)architectures$|arch$`='Architectures',
`(?i)architectures-add`='Architectures-Add',
`(?i)architectures-remove`='Architectures-Remove',
`(?i)languages$|lang$`='Languages',
`(?i)languages-add`='Languages-Add',
`(?i)languages-remove`='Languages-Remove',
`(?i)targets$|target$`='Targets',
`(?i)targets-add`='Targets-Add',
`(?i)targets-remove`='Targets-Remove',
`(?i)pdiffs`='PDiffs',
`(?i)by-hash`='By-Hash',
`(?i)allow-insecure`='Allow-Insecure',
`(?i)allow-weak`='Allow-Weak',
`(?i)allow-downgrade-to-insecure`='Allow-Downgrade-To-Insecure',
`(?i)trusted`='Trusted',
`(?i)signed-by`='Signed-By',
`(?i)check-valid-until`='Check-Valid-Until',
`(?i)valid-until-min`='Valid-Until-Min',
`(?i)valid-until-max`='Valid-Until-Max',
`(?i)check-date`='Check-Date',
`(?i)date-max-future`='Date-Max-Future',
`(?i)inrelease-path`='InRelease-Path',
`(?i)enabled`='Enabled'
))
LET DebTrue(string) = if(
condition=string=~'(?i)^(?:yes|true|with|on|enable)$',
then=true, else=false)
LET DebFalse(string) = if(
condition=string=~'(?i)^(?:no|false|without|off|disable)$',
then=true, else=false)
/* Extract Key–Value pairs from option string. If assignment is -=/+=,
the -/+ operator is captured in Op: */
LET OptStringToKeyValues__(string) = SELECT *
FROM parse_records_with_regex(
regex='''(?P<Key>[^ ]+?)(?P<Op>-|\+)?=(?P<Value>[^ ]+)''',
accessor='data', file=string
)
/* Since option values may have multiple words, split them and flatten
the results for further processing: */
LET OptStringToKeyValues_(string) = SELECT *
FROM flatten(query={
SELECT Key,
Op,
split(sep_string=',', string=Value) AS Value
FROM OptStringToKeyValues__(string=string)
})
/* Since options may be repeated, enumerate and group all values
per key and operation: */
LET OptStringToKeyValues(string) = SELECT Key,
Op,
enumerate(items=Value) AS Value
FROM OptStringToKeyValues_(string=string)
GROUP BY Key, Op
/* When an option is specified with +/-, represent this by appending
-Add/-Remove to the option name. These names match the syntax in
the deb822 format (i.e. "arch-=i386" == "Arhitectures-Remove: i386").
The purpose of these assignments is to keep the default values
(rather than overriding them), but add or remove one or several
values: */
LET OpName(op) = if(condition=op='+',then='-Add',else=
if(condition=op='-',then='-Remove',else=''))
/* Convert a string of key–value pairs to a dict, and use consistent
option names: */
LET OptStringToDict(string, flatten) = to_dict(item={
SELECT NormaliseOpts(string=Key)+OpName(op=Op) AS _key,
if(condition=flatten, then=Value,
else=join(array=Value, sep=' ')) AS _value
FROM OptStringToKeyValues(string=string)
})
/* Parse a one-line deb sources.list file with options as a single string: */
LET DebOneLine_Opts(OSPath) = SELECT OSPath, Type AS Types,
Simplify(string=Options) AS Options, URI AS URIs,
Transport AS _Transport, URIBase AS _URIBase, Suite AS Suites,
Simplify(string=Trim(string=Components)) AS Components
FROM parse_records_with_regex(
file=OSPath,
/* This regex attemps to cover most of the ways a sources
line can be written without being overly complex. Quotes
("" and []) are actually allowed to certain degree by the
apt source code, but this is considered obscure syntax and
is not expected to be found in the wild. The exception is
"cdrom:[word word…]", which is capture correctly in order
to not end up with incorrectly captured words: */
regex='''(?m)^\s*(?P<Type>deb(-src)?)(?:\s+\[(?P<Options>[^\]#]+)(?:#[^\]]+)?\])?\s+"?(?P<URI>(?P<Transport>[^:]+):(?://)?(?P<URIBase>\[.+?\]|\S+?))"?\s+(?P<Suite>\S+)\s+(?P<Components>[^\n#]+)'''
)
/* Parse a one-line deb sources.list file and output a dict: */
LET DebOneLine_Dict(OSPath, flatten) = SELECT OSPath, *
FROM foreach(row=DebOneLine_Opts(OSPath=OSPath),
query={SELECT _value +
OptStringToDict(string=Options, flatten=flatten) AS Contents
FROM items(item={SELECT Types, URIs, _Transport, _URIBase, Suites,
if(condition=flatten, then=split(sep_string=' ',
string=Components), else=Components) AS Components
FROM scope()
})
})
/* Parse a one-line deb sources.list file with options in individual columns: */
LET DebOneLine(OSPath) = SELECT OSPath, * FROM foreach(
row=DebOneLine_Dict(OSPath=OSPath, flatten=false),
column='Contents'
)
/* Parse a one-line deb sources.list file with options in individual
columns and flatten: */
LET DebOneLine_Flattened(OSPath) = SELECT OSPath, * FROM flatten(
query={SELECT * FROM foreach(
row=DebOneLine_Dict(OSPath=OSPath, flatten=true),
column='Contents'
)
})
/* Extract the transport/protocol and base from a URI: */
LET URIComponents(URI) = parse_string_with_regex(
regex='''(?P<Transport>[^:]+):(?://)?(?P<URIBase>[^\s]+)''',
string=URI
)
/* Although the documentation says to use whitespace and not comma
for multi-values in deb822, comma still appears to be supported,
and this use is seen in the wild. Treat these values correctly.
Note that this does not affect all keys, like suites and
components:
*/
LET MaybeReplaceComma(key, value) = if(
condition=key=~'(?i)^(?:arch|lang|targets)',
then=regex_replace(re='\s*,\s*', source=value, replace=' '),
else=value)
/* Parse a deb822 sources file section into a series of key–value pairs.
Notes about the format:
- Keys must be at the beginning of the line (no whitespace allowed)
- Keys are case-insensitive
- Keys may be repeated. Values are not overridden, but combined
- Special keys that end in -Add/-Remove uses the default values,
but add or remove individual values. These keys are treated as
individual option names.
- Comments may only appear at the beginning of the line
- Multiple values are separated by whitespace, not comma. However,
some multi-value fields separated by comma are still split, even
if this is not mentioned in the documentation.
- Values may be multi-line (like when containing an embedded GPG key),
but following lines must be prefixed by whitespace. Multilines
may contain comments (prefixed by whitespace or not). Empty lines
part of a multi-line value must be prefixed by whitespace and "."
- A file may contain multiple entries, separated by empty lines.
A file must be split into sections, fed individually to this function
*/
LET Deb822_KeyValues___(section) = SELECT Key,
/* Signed-By is special (it could be an embedded GPG key),and
shouldn't be split: */
if(condition=NormaliseOpts(string=Key)!='Signed-By',
then=split(sep_string=' ',
string=MaybeReplaceComma(key=Key,
value=Simplify(string=Trim(string=Value)))),
else=Value) AS Value
FROM parse_records_with_regex(
accessor='data',
/* A key is anything but whitespace up to a colon
Values can continue on several lines, but only if the following
lines are indented with whitespace
*/
regex='''(?m)^(?P<Key>[^#:\s]+)\s*:[^\S\n]*(?P<Value>[^\n]*(?:\n[^\S\n]+[^\n]+)*)''',
/* Before parsing the key–values, remove all comments from the file
(otherwise forming a regex without lookarounds would be very
difficult, if not impossible), Luckily, comments follow strict
rules and must start with ^#.
*/
file=regex_replace(
re='''(?m)^#.+\n''',
source=section
)
)
LET Deb822_KeyValues__(section) = SELECT * FROM flatten(query={
SELECT * FROM Deb822_KeyValues___(section=section)
})
LET Deb822_KeyValues_(section) = SELECT Key,
enumerate(items=Value) AS Value
FROM Deb822_KeyValues__(section=section)
GROUP BY Key
/* Parse a deb822 sources file section into a dict with consistent option
names: */
LET Deb822_KeyValues(section, flatten) = SELECT to_dict(
item={
SELECT NormaliseOpts(string=Key) as _key,
if(condition=flatten, then=Value,
else=join(array=Value, sep=' ')) AS _value
FROM Deb822_KeyValues_(section=section)
}) AS Contents
FROM scope()
/* Split paragraphs in a file (separated by one or several empty
lines) into rows. ('regex' is just anything that is illegal in Deb822Sections
to prevent splitting data into records.): */
LET Deb822Sections(OSPath) = SELECT OSPath,* FROM split_records(
filenames=OSPath,
columns='Section',
regex='^ #', record_regex='''\n{2,}'''
)
LET Deb822_Flattened_(OSPath) = SELECT * FROM foreach(
row=Deb822Sections(OSPath=OSPath),
query={SELECT OSPath, * FROM flatten(query={
SELECT * FROM foreach(
row=Deb822_KeyValues(section=Section, flatten=true),
column='Contents'
)
})}
)
/* DEB822_Sections() may produce empty rows. Exclude these by filtering
for a required column, like URIs: */
WHERE URIs
/* Parse a deb822 sources file with options in individual columns.
Note that, as opposed to DebOneLine and Deb822_Flattened, this
function does not return the columns _URIBase and _Transport, since
this format supports mulitple URIs to be specified: */
LET Deb822(OSPath) = SELECT * FROM foreach(
row=Deb822Sections(OSPath=OSPath),
query={SELECT OSPath, * FROM foreach(
row=Deb822_KeyValues(section=Section, flatten=false),
column='Contents'
)}
)
WHERE URIs
/* Parse a deb822 sources file with options in individual columns, flattened: */
LET Deb822_Flattened(OSPath) = SELECT * FROM flatten(query={
SELECT OSPath, *, URIComponents(URI=URIs).URIBase AS _URIBase,
URIComponents(URI=URIs).Transport AS _Transport
FROM Deb822_Flattened_(OSPath=OSPath)
})
/* Parse an apt sources/list file */
LET parse_aptsources(OSPath, flatten) = if(
condition=OSPath=~'.list$',
then=if(condition=flatten,
then=DebOneLine_Flattened(OSPath=OSPath),
else=DebOneLine(OSPath=OSPath)
),
else=if(condition=flatten,
then=Deb822_Flattened(OSPath=OSPath),
else=Deb822(OSPath=OSPath)
)
)
LET files = SELECT OSPath FROM glob(
globs=linuxAptSourcesGlobs.ListGlobs)
LET deb_sources = SELECT * FROM foreach(row=files,
query={SELECT * FROM parse_aptsources(OSPath=OSPath, flatten=true)}
)
parameters:
- name: linuxAptSourcesGlobs
description: Globs to find apt source *.list and .sources files.
type: csv
default: |
ListGlobs
/etc/apt/sources.list
/etc/apt/sources.list.d/*.list
/etc/apt/sources.list.d/*.sources
- name: aptCacheDirectory
description: Location of the apt cache directory.
default: /var/lib/apt/lists/
precondition:
SELECT OS From info() where OS = 'linux'
sources:
- name: Sources
query: |
/* Output sources in a readable format: */
SELECT * FROM foreach(row=files,
query={SELECT * FROM parse_aptsources(OSPath=OSPath, flatten=false)}
)
notebook:
- type: vql_suggestion
name: Only enabled sources
template: |
/*
# Sources (enabled only)
*/
SELECT * FROM source()
WHERE Enabled =~ '(?i)^(?:yes|true|with|on|enable)$' || true
- type: vql_suggestion
name: Trusted sources (apt-secure bypassed)
template: |
/*
# "Trusted" sources (apt-secure bypassed)
When the Trusted option is true, apt does not verify the GPG
signature of the Release files of the repository, and it also
doe not warn about this.
*/
SELECT * FROM source()
WHERE Trusted =~ '(?i)^(?:yes|true|with|on|enable)$' || false
- type: vql_suggestion
name: Hide embedded GPG keys
template: |
/*
# Sources (embedded GPG keys hidden)
*/
SELECT *, if(condition=get(field='Signed-By')=~'BEGIN PGP PUBLIC KEY',
then='(embedded)', else=get(field='Signed-By')) AS `Signed-By`
FROM source()
- name: SourcesFlattened
query: |
/* Output sources flattened for ease of analysis: */
SELECT * FROM deb_sources
- name: SourcesCacheFiles
query: |
/* We try to get at the Release file in /var/lib/apt/ by munging
the components and URL.
Strip the last component off, convert / and space to _ and
add _Release/_InRelease to get the filename.
*/
LET parsed_apt_lines = SELECT get(field='Architectures', default='') AS Architectures, URIs,
_URIBase + " " + Suites + " " + Components as Name, Types,
OSPath as Source, aptCacheDirectory + regex_replace(
replace="_",
re="_+",
source=regex_replace(
replace="_", re="[ /]",
source=_URIBase + "_dists_" + Suites
)) as cache_file
FROM deb_sources
GROUP BY URIs, Suites
/* This runs if the file was found. Read the entire file into
memory and parse the same record using multiple RegExps.
*/
LET parsed_cache_files(file) = SELECT Name, Architectures, URIs, Types,
Source, parse_string_with_regex(
string=regex_replace(source=Record,
re='(?m)^Version: GnuPG v.+$', replace=''
),
regex=["Codename: (?P<Release>[^\\n]+)",
"Version: (?P<Version>[^\\n]+)",
"Origin: (?P<Origin>[^\\n]+)",
"Architectures: (?P<Architectures>[^\\n]+)",
"Components: (?P<Components>[^\\n]+)"]) as Record
FROM parse_records_with_regex(file=file, regex="(?sm)(?P<Record>.+)")
// Foreach row in the parsed cache file, collect the FileInfo too.
LET add_stat_to_parsed_cache_file(file) = SELECT * from foreach(
query={
SELECT OSPath, Mtime, Ctime, Atime, Record, Types,
Name, Architectures, URIs, Source from stat(filename=file)
}, row=parsed_cache_files(file=file))
WHERE Record
GROUP BY OSPath
/* For each row in the parsed file, run the appropriate query
depending on if the cache file exists.
If the cache file is not found, we just copy the lines we
parsed from the source file and fill in empty values for
stat.
*/
LET parse_cache_or_pass = SELECT * from if(
condition={
SELECT * from stat(filename=cache_file + '_InRelease')
},
then=add_stat_to_parsed_cache_file(file=cache_file + '_InRelease'),
else={SELECT * FROM if(
condition={
SELECT * from stat(filename=cache_file + '_Release')
},
then=add_stat_to_parsed_cache_file(file=cache_file + '_Release'),
else={
SELECT Source, NULL AS OSPath, Null as Mtime, Null as Ctime,
Null as Atime, Types,
Null as Record, Architectures, URIs, Name from scope()
})
})
-- For each parsed apt .list file line produce some output.
SELECT * from foreach(
row={
SELECT * FROM parsed_apt_lines
},
query={
SELECT * FROM parse_cache_or_pass
})