What are the different types of patent databases? How should they be evaluated?

2020-12-10 14_12_56-evaluation - Google Search.jpg

Patent searchers and patent examiners have access to an increasing range of options for patent searching, ranging from conventional to ‘AI’ type methods, and this can make choosing between these different databases and approaches more complex than before.

Which led to a recent question - what is the best way of comparing and evaluating patent databases? This is a great question - and well worth exploring, from both of my perspectives as a patent search software vendor, and user.

But before we do this, it might compare to quickly overview the different types of patent databases.

These can be divided into the following general classes:

Method

Boolean (Conventional)

Semantic

Citation searching

Principle

Returns a set of patents that meet a particular query, which often includes a combination of keyword and class code terms.

Searches for patents that have similar keywords or blocks of text

Searches for patents that are similar to one or more starting patents.

Strengths

Widely used and accepted.

Can be used for new inventions.

Many existing vendors.

Can start with a description of the invention or a representative claim

Results are ranked

 

Excellent ability to return similar results.

Results are ranked, and can include relevant patents that have not been previously cited.

 

Weaknesses

Creating queries can be an artform.

Will often return many results that are not relevant, i.e. ‘false positives’.

Users can be caught out if relevant patents do not include the expected keywords or class codes.

Results may not be ranked in order of relevance.

Can return false positive results, despite having similar results.

Need a starting patent, which may not apply for new inventions.

May not pick up relevant patents where citation data is lacking.

Examples

Free: Google Patent, Patent Lens and Espacenet, plus national patent office sites, for example USPTO, IP Australia, and others.

Subscription: Patseer, Derwent Innovation, Patbase, Patsnap.

 

Innography, IP Rally, Innovation-Q, IP-Screener, Tekmine, Octimine, Incopat.

Ambercite

What is meant by ‘AI’ searching?

Increasingly people are talking about ‘AI’ solutions for patent searching. AI, or ‘artificial intelligence’ means different things for different people, but in the context of patent searching it almost always means Semantic or Ambercite type searching - as opposed to conventional searching.

Note that Ambercite is very different to semantic searching databases, and the different semantic search databases all appear to have different algorithms. For this reason we need to be careful to avoid treating all AI patent search are the same - because they are a range of quite different options in this area, that all need to be assessed individually.

Can these different techniques be combined?

Yes, different types of searching techniques can be combined, and in my opinion they should be. Each approach has their advantages, and so it makes sense to combine these advantages. As a simple analogy, this is a like a carpenter bringing a range of tools to their job - each tool has its special purpose and brings their own advantages.

For this reason, some of the Boolean databases include options for semantic searching - because the vendors themselves have recognised that there is no one-size-fits-all approach

How should these different search solutions be evaluated?

Based on many years of experience and discussions, I would suggest the following criteria be used:.

1) Outcomes: - Does the patent search software and process lead to patents that are relevant to the invention you are searching on?

This is the whole point of patent searching, but equally this question can be a bit of a ‘hand-waving’ comment, because it can be hard to test in practice. But not impossible to test - we will return to this point later in this blog.

One point to immediately consider is that there are often more than one relevant patent(s) for a given subject matter search. Because nobody has unlimited time for patent searching and reporting, most searchers and examiners will search until they find sufficient relevant documents to meet their needs, and report these results. In other words, there may be other patents out there that also meet the criteria - and some of these others patents may be better than the patents that the searcher has settled on.

To further complicate this, the final outcomes can depend on a number of different factors such as those discussed below. Also patent searches can often be iterative, and we should consider the process as a whole, rather than just after one step.

2) Precision and Recall: - Does it produce a targeted set of patents after running a query?

To look at this properly, it is helpful to think of two concepts:

‘False positives’ - these are patents that are returned by your patent search query, but are not relevant to your search objective. A patent search with a low false positive rate could also be said to have ‘high precision.

False positives are the big unspoken truth and a major cost in patent searching: any patent search will almost always return false positive results, which searchers then review to find and report the most relevant results. This can take hours and hours to do in practice.

‘False negatives’ - these are patents not found by the patent search, but which are relevant to the topic being searched. A patent search that returnes most of the relevant results, i.e. a low false negative rate, can be said to have ‘high recall’.

Patent searchers tend to be more nervous about false negatives than false positives, but in practice there is often a trade off between the two - achieving a low false negative (high recall) rate in a subject patent search can involve reviewing a lot of false positive patents.

A simple check for Precision and Recall:

If you are looking at various search databases, including semantic search databses - run a query, and look at the top ranked patents, say the first 25 listed.

  • How relevant are these patents to your objectives? (i.e. how many false positives do you have?)

  • How confident are you that all relevant patents are found? (i.e low false negative rate).

I have applied this test to many patent search databases, and the results are always illuminating.

3) Usability: - Is it easy and efficient to use?

This will come down to such factors as the user interface and how ‘friendly’ and easy it is to us, the ability to efficiently review results, compatibility with other search databases, how it saves results, and a whole range of other factors.

There are many ways of evaluating this, but the time taken to learn the product and to run a typical search could be part of the evaluation process.

Having used a wide range of patent search databases, there is no doubt that some are easier to use than others. One very well known and free database is very easy to run a search in, but the results are not as easy to review as they could be. I also remember some of the earlier command-line driven databases, which required detailed knowledge to both create a query and to review the results.

In general, the best of the subscription databases can greatly improve the patent review process compared to free databases, and are well worth the cost.

4) Features: - Does it have a range of helpful features?

Many patent databases have a range of different features. Sometimes these can help improve the patent searching and reporting process. However sometimes these are little more than ‘showroom features’ - they look good in brochures and in demonstrations but are rarely used in practice, particularly for subject matter searching.

5) Coverage: - Does it have full coverage of relevant patents?

Obviously we want the best possible coverage of published patents - and coverage can vary between databases.

6) Value: - Is it cost-effective?

The cost of patent databases ranges from free for public domain databases, and then upwards for commercial databases. While free is attractive to many people, we also need to consider the time needed to run and review queries, and to learn how to use the database. Even ‘free’ patent search databases will still cost you your valuable time.

This is where commercial databases create value - they can be much more efficient to use and review than free databases.


Just how do you know if the software has found a good set of relevant patents? (point #1 above)

There are various ways of doing this:

What I think is best practice

To me the best practice is to look at the best patents found using the patent software you are looking at, and then judge the relevance of these best patents against an objective set of criteria in relation to the search.

As examples of best practice, check out the independent studies (that investigate Ambercite and other approaches)published by Riahi Patents published here and the Austrian Patent Office published here.

In both of these cases, the investigators defined a set of inventions and key features for these inventions, ran a variety of patent search approaches, and objectively evaluated the patents found against these criteria.

We should also consider the amount of time required to find these patents. While if you do enough searching in almost most search engines you will likely find every relevant patent for a given search, the time required for this may be prohibitive - so we need be realistic about how much time you will put into a typical search.

What I suggest not to do.

Sometimes some investigators have taken a short-cut of saying ‘I know what the most relevant patents are for an invention (or most relevant prior art for a patent), and so I will see if a patent database can find these relevant patents’.

As a vendor of patent search software, I think that this approach is flawed - as this makes the assumption that the investigator knows ALL of the relevant prior art patents for an invention. And that is a big assumption, particularly because as all experienced patent searchers would know, assumptions are risky things.

Sometimes these investigators may have in mind patents that they have found after many hours of searching - and where these patents have not been listed as prior art by any examiner searching for prior art for this invention. While finding these missed-by-everyone-else patents is a great achievement for the investigator, this is a bit like judging a promising high school athlete because they can’t yet run 9.9 seconds for the 100m sprint… unrealistic and not taking into account the future potential and training of the athlete.

So instead I would suggest - look at what the patent search software does provide - and judge that instead.


How does Ambercite compare against the above criteria?

Ambercite was developed because patent searchers recognised the limitations of using Boolean searching alone. While not a complete substitute for a Boolean searching database, Ambercite does stack well against these six criteria, as will be discussed below.

1) Outcomes:

Ambercite has been proven to find the patents that are relevant to the invention you are searching on

We have produced many case studies showing how it does this.

But in reality, we recommend that Ambercite is best used alongside conventional Boolean searching, where it provides synergistic effects; ‘the icing on the cake’

As evidence of this, Independent research by Canadian search firm Riahi patents found that this led to an average 25% improvement in search outcomes compared to conventional Boolean searching alone.

Improvements.png

Note that to achieve these sorts of results can involve iterative searching in Ambercite, rather than from a single search. But this is easy to do in practice.

2) Precision and Recall:

Ambercite produces a targeted set of patents

An Ambercite search will produce a ranked list of similar patents to one or more starting patents, many of which have high relevance. An example of this is shown below, in relation to a US patent field for a cooling system for a hybrid car patent: (you can click on this to see a fully interactive version)

From experience, the false positive rate is much lower (higher precision) than Boolean searching, and many semantic search engines I have used. And the unique approach used by Ambercite can find many new patents that are not found by other search approaches, i.e. false negatives in other search tools.

The false negative rate for Ambercite is harder to measure precisely - one of the challenges is that it is impossible to count missed and relevant patents that we do not know about, because we don’t know what they are. However - the Ambercite algorithm combine search data from many different examinations, and so can find a proportion of the relevant patents (high recall). This does however depend on starting with good query patents - but then the need for a good query applies to every patent search.

3) Usability:

Ambercite is easy to learn and use

All you need to do is one or more relevant patents, and Ambercite will instantly create a ranked list of up to 2000 potentially relevant patents.

While new users find Ambercite very easy to use, we also supply:

  • Online training for new clients

  • An interactive onboarding process

  • Tooltips and online material designed to answer any questions that you have.

4) Features:

Ambercite has over 20 features designed to make citation searching as productive as possible.

Ambercite includes a whole range of features designed to make the product very efficient to use, including clear ranking of results, and the ability to iterate your way towards a final set of good patents.

These features are discussed in detail in the recent blog linked below:

5) Coverage:

Ambercite has full coverage of relevant results

The coverage in Ambercite is global and matches that in the EPO database Espacenet, i.e 120 million patents from over 90 countries, including all key patent jurisdictions.

6) Value:

Ambercite is very cost effective and can create high value

Ambercite is very cost-effective compared to other subscription software.

When you consider the many hours it can save you screening out false positives in conventional patent searching, and the improved search outcomes it produces, Ambercite can be regarded as great value.


Do you want to test these features and benefits for yourself?

Ambercite offers free trials, but to get the most of this, please contact us for a demonstration. You can try either option via the links below: