Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

Riedl, Christoph; Zanibbi, Richard; Hearst, Marti A.; Zhu, Siyu; Menietti, Michael; Crusan, Jason; Metelsky, Ivan; Lakhani, Karim R.

doi:10.1007/s10032-016-0260-8

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

Original Paper
Published: 20 February 2016

Volume 19, pages 155–172, (2016)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Christoph Riedl^1,5,
Richard Zanibbi²,
Marti A. Hearst³,
Siyu Zhu⁴,
Michael Menietti⁵,
Jason Crusan⁶,
Ivan Metelsky⁷ &
…
Karim R. Lakhani⁸

891 Accesses
8 Citations
14 Altmetric
1 Mention
Explore all metrics

Abstract

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Flowchart Recognition in Patent Information Retrieval

Automated patent landsca**

Article Open access 28 March 2018

Learning Efficient Representations for Image-Based Patent Retrieval

Notes

http://archive.ics.uci.edu/ml/datasets/USPTO+Algorithm+Challenge%2C+run+by+NASA-Harvard+Tournament+Lab+and+TopCoder++++Problem%3A+Pat.
Embedded within this competition was a social science experiment to investigate different team formation mechanisms. Two treatments were implemented. In treatment one, teams were formed through bilateral agreement between participants after communicating through a public forum or private direct messaging (this was termed the ‘free-form’ treatment). In the second treatment, teams were formed based on a stable-matching algorithm using participants’ stated preferences (termed ‘algorithm’ treatment). We found no significant differences in algorithm performance between the two treatments. The exact details of the social science experiment are beyond the scope of this paper. Some preliminary results can be found in this working paper http://goo.gl/NjoWce.
http://labelme.csail.mit.edu/Release3.0/.
The final ranking of all submissions is publicly available on the TopCoder Web site at https://community.topcoder.com/longcontest/stats/?&sr=1&nr=50&module=ViewOverview&rd=15027.

References

Archak, N.: Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com. In: Proceeding of the International Conference World Wide Web, pp. 21–30 (2010)
Barney Smith, E., Belaid, A., Kise, K. (eds.): Proceedings of the International Conference Document Analysis and Recognition. IEEE Computer Society, Washington, DC (2013)
Google Scholar
Bhatti, N., Hanbury, A.: Image search in patents: a review. Int. J. Doc. Anal. Recognit. 16(4), 309–329 (2013)
Article Google Scholar
Blumenstein, M., Pal, U., Uchida, S. (eds.): Proceedings of the International Work. Document Analysis Systems. IEEE Computer Society, Gold Coast, Australia (2012)
Google Scholar
Boudreau, K.J., Lacetera, N., Lakhani, K.R.: Incentives and problem uncertainty in innovation contests: an empirical analysis. Manag. Sci. 57(5), 843–863 (2011)
Article Google Scholar
Boudreau, K.J., Lakhani, K.R.: Using the crowd as an innovation partner. Harv. Bus. Rev. 91(4), 61–69 (2013)
Google Scholar
Bukhari, S.S., Shafait, F., Breuel, T.M.: Coupled snakelets for curled text-line segmentation from warped document images. Int. J. Doc. Anal. Recognit. 16(1), 33–53 (2013)
Article Google Scholar
Casey, R., Lecolinet, E.: Strategies in character segmentation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)
Article Google Scholar
Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)
Article MATH Google Scholar
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 440–445. Bei**g, China (2011)
Do, T.H., Tabbone, S., Ramos-Terrades, O.: Text/graphic separation using a sparse representation with multi-learned dictionaries. In: Proceedings of the International Conference Pattern Recognition, pp. 689–692. Tsukuba, Japan (2012)
Doermann, D., Tombre, K. (eds.): Handbook of Document Image Processing and Recognition, vol. 2. Springer, New York (2014)
MATH Google Scholar
Dori, D., Wenyin, L.: Automated CAD conversion with the machine drawing understanding system: concepts, algorithms, and performance. IEEE Trans. Syst. Man Cybern. A 29(4), 411–416 (1999)
Article Google Scholar
D’Ulizia, A., Ferri, F., Grifoni, P.: A survey of grammatical inference methods for natural language learning. Artif. Intell. Rev. 36(1), 1–27 (2011)
Article Google Scholar
Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
Fletcher, L., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)
Article Google Scholar
Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.): Information Access Evaluation. Multilinguality, Multimodality, and Visualization—4th International Conference of the CLEF Initiative, Lecture Notes in Computer Science, vol. 8138. Springer, Valencia (2013)
Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part I. IEEE Trans. Syst. Man Cybern. 5(1), 95–111 (1975)
Article MathSciNet MATH Google Scholar
Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part II. IEEE Trans. Syst. Man Cybern. 5(4), 409–423 (1975)
Article MathSciNet MATH Google Scholar
Fullerton, R.L., McAfee, R.P.: Auctioning entry into tournaments. J. Polit. Econ. 107(3), 573–605 (1999)
Article Google Scholar
Gobeill, J., Teodoro, D., Pasche, E., Ruch, P.: Report on the TREC 2009 experiments: chemical IR track. In: Text Retrieval Conference (TREC’09) (2009)
Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown, New York (2008)
Google Scholar
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)
Article Google Scholar
Kalil, T., Sturm, R.: Congress grants broad prize authority to all federal agencies (2010). http://wh.gov/OSw
Kanungo, T., Haralick, R., Dori, D.: Understanding engineering drawings: a survey. In: Proceedings of Work. Graphics Recognition, pp. 217–228 (1995)
Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1485–1490 (2011)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1484–1493 (2013)
Koo, H., Kim, D., et al.: Scene text detection via connected component clustering and non-text filtering. IEEE Transaction Image Processing, pp. 2296–2305 (2013)
Lai, C., Kasturi, R.: Detection of dimension sets in engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 848–855 (1994)
Article Google Scholar
Lakhani, K.R., Boudreau, K.J., Loh, P.R., Backstrom, L., Baldwin, C., Lonstein, E., Lydon, M., MacCormack, A., Arnaout, Ra, Guinan, E.C.: Prize-based contests can provide solutions to computational biology problems. Nat. Biotechnol. 31(2), 108–111 (2013)
Article Google Scholar
Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 42–47. Bei**g, China (2011)
Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)
Article Google Scholar
Lu, T., Tai, C.L., Yang, H., Cai, S.: A novel knowledge-based system for interpreting complex engineering drawings: theory, representation, and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1444–1457 (2009)
Article Google Scholar
Lu, Z.: Detection of text regions from digital engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 431–439 (1998)
Article Google Scholar
Lupu, M., Hanbury, A.: Patent retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)
Article Google Scholar
Mervis, J.: Agencies rally to tackle big data. Science 336(6077), 22 (2012)
Article Google Scholar
Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
Article Google Scholar
Nagy, G., Embley, D.W., Krishnamoorthy, M.S., Seth, S.C.: Clustering header categories extracted from web tables. In: Ringger, E.K., Lamiroy, B. (eds.) Document Recognition and Retrieval XXII, Proceedings of SPIE, vol. 9402, p. 94020M. San Francisco (2015)
Nelson, R.R.: Uncertainty, learning, and the economics of parallel research and development efforts. Rev. Econ. Stat. 43(4), 351–364 (1961)
Article Google Scholar
Niemeijer, M., Van Ginneken, B., Cree, M., Mizutani, A., Quellec, G., Sanchez, C., Zhang, B., Hornero, R., Lamard, M., Muramatsu, C.: Others: retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Trans. Med. Imaging 29(1), 185–195 (2010)
Article Google Scholar
Ogier, J.M. (ed.): Proceedings of the International Work. Graphics Recognition (GREC 2013), Lecture Notes in Computer Science, vol. 8746. Springer, Bethlehem, PA (2014)
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)
Article Google Scholar
Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fifth annual test of OCR accuracy. Information Science Research Institute (1996)
Rice, S.V., Nagy, G.L., Nartker, T.A.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer, New York (1999)
Book Google Scholar
Roy, P.P., Pal, U., Lladós, J.: Text line extraction in graphical documents using background and foreground information. Int. J. Doc. Anal. Recognit. 15(3), 227–241 (2012)
Article Google Scholar
Rusiñol, M., de las Heras, L., Ramos, O.: Flowchart recognition for non-textual information retrieval in patent search. Inf. Retr. 17(5–6), 545–562 (2014)
Article Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Article Google Scholar
Sadawi, N.M., Sexton, A.P., Sorge, V.: Performance of MolRec at TREC 2011—overview and analysis of results. In: The Twentieth Text REtrieval Conference Proceedings (TREC). National Institute of Standards and Technology (NIST), USA (2011)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of International Conference Document Analysis and Recognition, pp. 1491–1496 (2011)
Simon, H., Newell, A.: Computer simulation of human thinking and problem solving. Monogr. Soc. Res. Child Behav. 27, 137–150 (1962)
Article Google Scholar
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference Document Analysis and Recognition, vol. 2, pp. 629–633. Curitiba, Brazil (2007)
Sturm, R.: New center for excellence fuels prize to help modernize tools for patent examination (2011). http://wh.gov/DdM
Tassey, G., Rowe, B.R., Wood, D.W., Link, A.N., Simoni, D.A.: Economic impact assessment of NIST’s text REtrieval conference (TREC) program. National Institute of Standards and Technology (2010)
Terwiesch, C., Ulrich, K.T.: Innovation Tournaments: Creating and Selecting Exceptional Opportunities. Harvard Business Press, Boston (2009)
Google Scholar
Terwiesch, C., Xu, Y.: Innovation contests, open innovation, and multiagent problem solving. Manag. Sci. 54(9), 1529–1543 (2008)
Article Google Scholar
Tombre, K., Tabbone, S., Pélissier, L., Lamiroy, B., Dosch, P.: Text/graphics separation revisited. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423, pp. 200–211. Springer, Berlin (2002)
Google Scholar
Valveny, E., Lamiroy, B.: Scan-to-XML: automatic generation of browsable technical documents. In: Proceedings of the International Conference Pattern Recognition, vol. 3, pp. 188–191. Québec City, Canada (2002)
Wagner, R., Fischer, M.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Wendling, L., Tabbone, S.: A new way to detect arrows in line drawings. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 935–941 (2004)
Article Google Scholar
Wu, V., Manmatha, R., Riseman, E.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1224–1229 (1999)
Article Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Article Google Scholar
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)
Article Google Scholar
Zanibbi, R., Blostein, D., Cordy, J.R.: White-box evaluation of computer vision algorithms through explicit decision-making. Computer Vision Systems. Lecture Notes in Computer Science, vol. 5815, pp. 295–304. Springer, Liège, Belgium (2009)
Zheng, Y., Li, H., Doermann, D.: Machine printed text and handwriting identification in noisy document images. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 337–353 (2004)
Article Google Scholar
Zhou, W., Li, H., Lu, Y., Tian, Q.: Principal visual word discovery for automatic license plate detection. IEEE Trans. Image Process. 21(9), 4269–4279 (2012)
Article MathSciNet Google Scholar
Zhu, S., Zanibbi, R.: Label detection and recognition for USPTO images using convolutional k-means feature quantization and AdaBoost. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 633–637. Washington, DC (2013)

Download references

Acknowledgments

We are grateful for helpful comments provided by Ahmad Ahmad and the anonymous reviewers. This research was supported in part by the NASA Tournament Laboratory and the United States Patent and Trademark Office (USPTO).

Author information

Authors and Affiliations

D’Amore-McKim School of Business, and College of Computer and Information Science, Northeastern University, Boston, MA, 02115, USA
Christoph Riedl
Department of Computer Science, Rochester Institute of Technology, Rochester, NY, 14623, USA
Richard Zanibbi
School of Information, UC Berkeley, Berkeley, CA, 94720, USA
Marti A. Hearst
Center for Imaging Science, Rochester Institute of Technology, Rochester, NY, 14623, USA
Siyu Zhu
Institute for Quantitative Social Science, Harvard University, Cambridge, MA, 02138, USA
Christoph Riedl & Michael Menietti
Advanced Exploration Systems Division, NASA, Washington, DC, USA
Jason Crusan
TopCoder Inc., Glastonbury, CT, 06033, USA
Ivan Metelsky
Department of Technology and Operations Management, Harvard Business School, Boston, MA, 02134, USA
Karim R. Lakhani

Authors

Christoph Riedl
View author publications
You can also search for this author in PubMed Google Scholar
Richard Zanibbi
View author publications
You can also search for this author in PubMed Google Scholar
Marti A. Hearst
View author publications
You can also search for this author in PubMed Google Scholar
Siyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Menietti
View author publications
You can also search for this author in PubMed Google Scholar
Jason Crusan
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Metelsky
View author publications
You can also search for this author in PubMed Google Scholar
Karim R. Lakhani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Riedl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riedl, C., Zanibbi, R., Hearst, M.A. et al. Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms. IJDAR 19, 155–172 (2016). https://doi.org/10.1007/s10032-016-0260-8

Download citation

Received: 22 January 2015
Revised: 01 January 2016
Accepted: 01 February 2016
Published: 20 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10032-016-0260-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flowchart Recognition in Patent Information Retrieval

Automated patent landsca**

Learning Efficient Representations for Image-Based Patent Retrieval

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flowchart Recognition in Patent Information Retrieval

Automated patent landsca**

Learning Efficient Representations for Image-Based Patent Retrieval

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation