Difference between revisions of "FactFerret"

From ICMS
Jump to navigation Jump to search
m (Woozle moved page DataFerret to FactFerret: catchier and more generally descriptive)
(Replaced content with "category:moved moved to [https://wooz.dev/FactFerret wooz.dev]")
Tag: Replaced
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
==About==
+
[[category:moved]]
[[FactFerret]] is a way to store arbitrary facts (including data series) from multiple sources in a single database, while preserving the meaning and sources of each fact.
+
moved to [https://wooz.dev/FactFerret wooz.dev]
 
 
The purpose is to allow multi-layered querying (e.g. "display rate of all A that didn't include B from year C to year D") without foreknowledge of what data is available. This is in some ways similar to what [http://gapminder.com GapMinder] does, except it should be relatively easy to add new datasets (one database schema should be able to accommodate any assertion of fact) and it should be capable of data-dependent conditionals ("all A that didn't include B" is just a very simple example).
 
 
 
It also bears some similarity to Semantic MediaWiki, except that data is entered and updated independently of a wiki page, via both manual and automated methods.
 
==Schema==
 
This is a preliminary schema, just to give an idea of how it works.
 
 
 
'''value''' -- the value of one dimension of a given data point
 
* ID_Point
 
* ID_Axis
 
* Value
 
 
 
'''point''' -- a specific point (one or more dimensions) within a series
 
* ID (auto)
 
* ID_Series
 
 
 
'''axis''' -- defines a dimension used within a particular dataset
 
* ID (auto)
 
* ID_Series
 
* Name
 
* ID_Unit
 
 
 
'''series''' -- a dataset (one or more dimensions) from a given source
 
* ID (auto)
 
* Name
 
* Descrip
 
* ID_Source
 
* ''possibly other attributes''
 
 
 
'''series_attrib''' - defines what a series relates to (this needs refinement)
 
* ID (auto)
 
* ID_Series
 
* ID_Attrib
 
* Value
 
 
 
'''source''' -- a particular data source
 
* ID (auto)
 
* ID_Entity -- organization or individual who created the data
 
* URL -- (optional) web page where data may be found
 
* When_Retrieved
 
 
 
'''unit''' -- type of unit which can be used on an axis
 
* ID (auto)
 
* Name
 
* ID_Handler -- sprintf(), date(), custom code...
 
 
 
'''unit_format''' -- a particular way of displaying values for a type of unit
 
* ID (auto)
 
* ID_Unit
 
* Name -- a name for the format, e.g. "ISO xxxx"
 
* Tplt -- template string to pass to unit handler (e.g. "%y/%m/%d")
 
 
 
Other areas to be addressed:
 
* relationships -- does this schema permit a way of encoding set relationships?
 
* attributes -- what do we mean by this?
 
 
 
==Views==
 
* Represent any 2 axes as a graph/chart -- basically, spreadsheet graphing functionality:
 
** variety of graph/chart formats available
 
** eventually, add more dimensions (color, size, slider) a la GapMinder
 
** restrict [[wikipedia:Range (computer programming)|range]] or show entire range
 
* Answer questions written in English-like syntax, with graphs or scalars:
 
** "During the 2008 mortgage crisis, what percent of loan defaults came from CRA-inspired loans?" (scalar output)
 
** "Display rate of default for CRA-inspired loans versus all loans during the 2008 mortgage crisis." (graph output, restricted range)
 
** "Display {profitability of loans to minorities} and {profitability of loans overall} by year." (graph output, unrestricted range)
 
* Offer sources for all data presented.
 
* Where data from multiple sources differs: offer to average it, present each source separately, or show each source separately in the output (e.g. as a differently-colored line).
 
==Links==
 
===Possibly Related===
 
The following projects may be related, but it's difficult to tell:
 
* [http://mike2.openmethodology.org/ MIKE2.0]
 
* [http://www.precog.com/how-it-works Precog]
 
===Posts===
 
* '''2012-06-12''' [https://plus.google.com/u/0/102282887764745350285/posts/HV1iFw4mmuu Woozle@G+]: it might not be obvious, but this post is describing what I later decided to call "DataFerret"
 

Latest revision as of 12:57, 4 October 2020

moved to wooz.dev