I put some time into providing proper histograms for Salstat. The basics work and work well enough but the advanced stuff is yet to come.

Histograms are very useful in statistics because they can help us immediately see the distribution of a vector's data. The screenshot above tells me that it's not likely to be a normal distribution. If I wanted to perform an inferential test that assumed the data were normally distributed, I might need to transform them to a normal distribution first or use a test that doesn't have that assumption.

The critical thing, however, is to see how the data look, and Salstat does this in a basic form.

*How does it work?*

Python's Numpy module has a handy histogram function that (in its simplest form) takes a vector and returns 2 vectors of frequencies and limits. These are used to directly form the histogram.

Once completed, a column chart is drawn in HighCharts using these values but with some additional 'plotOptions' so that no gaps exist between the columns.

*What's left to do? *

We're keen to get the first two working but are unsure how to design the interface to meet this need. The interface was designed for simpler charting needs and will need careful thought before accommodating those needs.

For now, however, Salstat has a basic histogram charting function which meets probably 80% of needs.

Histograms are very useful in statistics because they can help us immediately see the distribution of a vector's data. The screenshot above tells me that it's not likely to be a normal distribution. If I wanted to perform an inferential test that assumed the data were normally distributed, I might need to transform them to a normal distribution first or use a test that doesn't have that assumption.

The critical thing, however, is to see how the data look, and Salstat does this in a basic form.

Python's Numpy module has a handy histogram function that (in its simplest form) takes a vector and returns 2 vectors of frequencies and limits. These are used to directly form the histogram.

Once completed, a column chart is drawn in HighCharts using these values but with some additional 'plotOptions' so that no gaps exist between the columns.

- The histogram defaults to 10 bins. This is fine for basic uses but more advanced use cases need to let the user define the bins.
- Histogram limits are defined by the minimum and maximum of the data. Some users need to define their own.
- Rarer use cases might exist for defining weights for each bin and the histogram function might need to return the probability density function rather than the counts.

We're keen to get the first two working but are unsure how to design the interface to meet this need. The interface was designed for simpler charting needs and will need careful thought before accommodating those needs.

For now, however, Salstat has a basic histogram charting function which meets probably 80% of needs.

We're happy to hear from you

Email usContact form

Tweets by @Salstat

Salstat is an open source project fostered by Thought Into Design Ltd

Thought Into Design Ltd is registered in England and Wales (Companies House number 7367421)