Difference between revisions of "FactFerret"
(clarification) |
(reordered; explanations and summaries) |
||
Line 8: | Line 8: | ||
This is a preliminary schema, just to give an idea of how it works. | This is a preliminary schema, just to give an idea of how it works. | ||
− | ''' | + | '''value''' -- the value of one dimension of a given data point |
− | + | * ID_Point | |
− | |||
− | |||
− | '''value | ||
− | * | ||
* ID_Axis | * ID_Axis | ||
+ | * Value | ||
− | ''' | + | '''point''' -- a specific point (one or more dimensions) within a series |
* ID (auto) | * ID (auto) | ||
* ID_Series | * ID_Series | ||
− | |||
− | |||
− | '''axis''' | + | '''axis''' -- defines a dimension used within a particular dataset |
* ID (auto) | * ID (auto) | ||
* ID_Series | * ID_Series | ||
Line 28: | Line 23: | ||
* ID_Unit | * ID_Unit | ||
− | '''series''' | + | '''series''' -- a dataset (one or more dimensions) from a given source |
* ID (auto) | * ID (auto) | ||
* Name | * Name | ||
Line 35: | Line 30: | ||
* ''possibly other attributes'' | * ''possibly other attributes'' | ||
− | '''source''' | + | '''series_attrib''' - defines what a series relates to (this needs refinement) |
+ | * ID (auto) | ||
+ | * ID_Series | ||
+ | * ID_Attrib | ||
+ | * Value | ||
+ | |||
+ | '''source''' -- a particular data source | ||
* ID (auto) | * ID (auto) | ||
* ID_Entity -- organization or individual who created the data | * ID_Entity -- organization or individual who created the data | ||
Line 41: | Line 42: | ||
* When_Retrieved | * When_Retrieved | ||
− | '''unit''' | + | '''unit''' -- type of unit which can be used on an axis |
* ID (auto) | * ID (auto) | ||
* Name | * Name | ||
* ID_Handler -- sprintf(), date(), custom code... | * ID_Handler -- sprintf(), date(), custom code... | ||
− | '''unit_format''' | + | '''unit_format''' -- a particular way of displaying values for a type of unit |
* ID (auto) | * ID (auto) | ||
* ID_Unit | * ID_Unit | ||
* Name -- a name for the format, e.g. "ISO xxxx" | * Name -- a name for the format, e.g. "ISO xxxx" | ||
* Tplt -- template string to pass to unit handler (e.g. "%y/%m/%d") | * Tplt -- template string to pass to unit handler (e.g. "%y/%m/%d") | ||
+ | |||
+ | Other areas to be addressed: | ||
+ | * relationships -- does this schema permit a way of encoding set relationships? | ||
+ | * attributes -- what do we mean by this? | ||
==Views== | ==Views== |
Revision as of 19:42, 22 March 2013
About
DataFerret is a way to store arbitrary data from multiple sources in a single database, while preserving the meaning of each datum.
The purpose is to allow multi-layered querying (e.g. "display rate of all A that didn't include B from year C to year D") without foreknowledge of what data is available. This is in some ways similar to what GapMinder does, except it should be relatively easy to add new datasets (one database schema should be able to accommodate any assertion of fact) and it should be capable of data-dependent conditionals ("all A that didn't include B" is just a very simple example).
It also bears some similarity to Semantic MediaWiki, except that data is entered and updated independently of a wiki page, via both manual and automated methods.
Schema
This is a preliminary schema, just to give an idea of how it works.
value -- the value of one dimension of a given data point
- ID_Point
- ID_Axis
- Value
point -- a specific point (one or more dimensions) within a series
- ID (auto)
- ID_Series
axis -- defines a dimension used within a particular dataset
- ID (auto)
- ID_Series
- Name
- ID_Unit
series -- a dataset (one or more dimensions) from a given source
- ID (auto)
- Name
- Descrip
- ID_Source
- possibly other attributes
series_attrib - defines what a series relates to (this needs refinement)
- ID (auto)
- ID_Series
- ID_Attrib
- Value
source -- a particular data source
- ID (auto)
- ID_Entity -- organization or individual who created the data
- URL -- (optional) web page where data may be found
- When_Retrieved
unit -- type of unit which can be used on an axis
- ID (auto)
- Name
- ID_Handler -- sprintf(), date(), custom code...
unit_format -- a particular way of displaying values for a type of unit
- ID (auto)
- ID_Unit
- Name -- a name for the format, e.g. "ISO xxxx"
- Tplt -- template string to pass to unit handler (e.g. "%y/%m/%d")
Other areas to be addressed:
- relationships -- does this schema permit a way of encoding set relationships?
- attributes -- what do we mean by this?
Views
- Represent any 2 axes as a graph/chart -- basically, spreadsheet graphing functionality:
- variety of graph/chart formats available
- eventually, add more dimensions (color, size, slider) a la GapMinder
- restrict range or show entire range
- Answer questions written in English-like syntax, with graphs or scalars:
- "During the 2008 mortgage crisis, what percent of loan defaults came from CRA-inspired loans?" (scalar output)
- "Display rate of default for CRA-inspired loans versus all loans during the 2008 mortgage crisis." (graph output, restricted range)
- "Display {profitability of loans to minorities} and {profitability of loans overall} by year." (graph output, unrestricted range)
- Offer sources for all data presented.
- Where data from multiple sources differs: offer to average it, present each source separately, or show each source separately in the output (e.g. as a differently-colored line).