Speed lets you do stuff faster, but also creates a marvelous feeling of progress. The faster you are able to iterate, the less friction in testing or doing new things. It lowers the cost of experimentation and softens the learning curve, which – very often – leads to better results and a more motivated team.

The opposite applies as well: if getting results at something is going to take lots of time, you will be less motivated to do it, so you will probably never do it.

The same happens with data. With more and better data, the possibilities of getting new and better insights grow dramatically, but as the data grows, so does the complexity of working with it. So, isn’t working faster a great way of understanding – and removing – complexity?

Google – famously – defined speed as a feature. They realized that if search was fast, we were more likely to search. The reason is that it encourages you to try stuff, get feedback, and try again. When a thought occurs to you, you know Google is already there. There is no delay between thought and action. The projected cost of googling something is zero. It comes to feel like an extension of your own mind.

…whenever a new idea pop-ups in your mind, you should be able to give it a try in seconds, at a zero projected cost.

Going back to your data, whenever a new idea pop-ups in your mind, you should be able to give it a try in seconds, at a zero projected cost. There is no good way to uncover all those new insights underlying terabytes of data unless you make the process of working with it tremendously rewarding and fast.

And that’s why for us at Tinybird, speed is the feature. Below, you will find some examples of basic analytical operations using our Query API, which take less than 200ms for over 130M records in our basic setup, using the NYC taxi data, so you can see for yourself that speed is paramount at Tinybird.

Quick data aggregations

Calculating the average value of a numeric value with no optimizations over 134MM.

const READ_TOKEN = 'p.eyJ1IjogImMzZTMwNDIxLTYwNzctNGZhMS1iMjY1LWQwM2JhZDIzZGRlOCIsICJpZCI6ICJlNGE4MzVkNC02NmZkLTQyNWItYjBiMC01NTkxZGVmMDQ5ZjQifQ.Q6RPBMadv6-irUlWz2cliWaB8c0bB0fY2INkctPXilU';
var nytaxi = tinybird(READ_TOKEN).pipe('nyc_taxi_17_18_pipe')
var res = await nytaxi.json(`
SELECT
  avg(trip_distance) as avgDistance
FROM
 nyc_taxi_17_18`)
console.log(res.data[0]['avgDistance'], res.statistics.elapsed)

or calculating a histogram of that same numeric value:

var res = await nytaxi.json(`
SELECT
  toHour(tpep_pickup_datetime) d,
  count(1) t
FROM
  nyc_taxi_17_18
GROUP BY d
ORDER BY d ASC`)
console.log(`  month   |   count    `)
console.log(`--------------------------`)
for (var month of res.data) {
  console.log(`  ${month.d}   |   ${month.t}`)
}
console.log(`--------------------------`, res.statistics.elapsed)

Filtering data

Filtering the same 100MM rows data source by a string* column.

var res = await nytaxi.json(`SELECT count() as c FROM nyc_taxi_17_18 WHERE store_and_fwd_flag != 'N'`)
console.log(res.data[0]['c'], res.statistics.elapsed)

*The store_and_fwd_flag indicates whether the trip record was held in vehicle memory before sending to the vendor.

Joining data

Joining two different data sources is also pretty fast.

var res = await nytaxi.json(`
SELECT zone, pickups FROM
(
  SELECT
    pulocationid p,
    count(1) pickups
  FROM
    nyc_taxi_17_18
  GROUP BY
    p
) as picks
INNER JOIN taxi__zone_lookup as a ON picks.p = taxi__zone_lookup.locationid
ORDER BY pickups DESC
LIMIT 10`)
console.log(`---`)
for (var d of res.data) {
  console.log(`${d.zone}: ${d.pickups.toString().replace(/(\d)(?=(\d{3})+(?!\d))/g, '$1,')} pickups`)
}
console.log(`---`, res.statistics.elapsed)

As you can see, we take very seriously accelerating analytics over millions and billions of rows. If you have any big dataset around, let us know!